Updates from mitcho (Michael 芳貴 Erlewine) Toggle Comment Threads | Keyboard Shortcuts

  • mitcho (Michael 芳貴 Erlewine) 1:13 pm on January 28, 2011 Permalink | Reply  

    ShrimpTest 1.0b2 

    ShrimpTest 1.0 beta 2 updates ShrimpTest to be WordPress 3.1 compatible, and takes advantage of the new Admin Bar functionality. Please give it a try!


  • mitcho (Michael 芳貴 Erlewine) 11:52 pm on September 4, 2010 Permalink | Reply
    Tags: ,   

    Inline documentation: complete! 

    Today, after a brief hiatus preparing for going back to school, I’ve completed and committed all inline documentation in the ShrimpTest source. This is all being done using PHPDoc format, and I also committed the PHPDoc documentation to svn as well. Feel free to check it out!


  • mitcho (Michael 芳貴 Erlewine) 11:27 pm on August 29, 2010 Permalink | Reply
    Tags: ,   

    Now that the experiment duration (sample size) and notification features have been implemented, ShrimpTest is feature-complete! Download it now!


  • mitcho (Michael 芳貴 Erlewine) 11:03 pm on August 29, 2010 Permalink | Reply

    Added some additional UI polish: start and finish times are now consistently recorded and displayed, and messages like “Experiment activated.” are displayed:

  • mitcho (Michael 芳貴 Erlewine) 7:59 pm on August 29, 2010 Permalink | Reply
    Tags: notification,   

    Just committed the notification plugin that I’ve been working on for a while. This functionality was blocking on the experiment duration system. When an experiment stat is computed and trips the experiment duration line for the first time, the shrimptest_experiment_duration_reached action is triggered, which is used by this notification.

    Along the way, I had to modify a variety of things in the Model so that arbitrary data can be added to experiments and variants (such as, in this case, the notification emails) and can be updated correctly.

  • mitcho (Michael 芳貴 Erlewine) 12:55 am on August 28, 2010 Permalink | Reply

    Just committed UI for specifying/calculating experiment duration:

    The idea is that, as I mentioned last week, there are statistical problems with not specifying a sample size (i.e. an experiment duration) ahead of time. This UI simply asks the user what the “detection level” for this metric is (here, 0.01, in other words, 1% differences in conversion rate) and computes an experiment duration based on that, automatically.

    Doing this in the general case is hard, as we need to know the variance of the metric we are using. However, ShrimpTest can take advantage of the fact that our goal metrics are organized around “metric types”, “conversion” being one of them. Because conversion events are Bernoulli trials, their maximum variance is known, enabling this calculation. For custom (manual) metrics, an input box is shown to ask the user to specify a variance, if they would like to take advantage of the experiment duration calculation.

    This experiment duration value will be used in calculating when an experiment can be completed, enabling notifications when experiments are over without just waiting on a significant result.

  • mitcho (Michael 芳貴 Erlewine) 6:34 pm on August 24, 2010 Permalink | Reply
    Tags: ,   

    I added an A/B shortcode generating button to the plain text editor:

    The code for the TinyMCE version is also mostly there, though it’s not working and will need some debugging.

    Also, as seen in the video, the format of the A/B Shortcode itself has changed, following my discussion last week with Joen. Now instead of the control value being the inline content, it is also a shortcode attribute like the other variants.

    All these changes have been committed. Please try it out!

  • mitcho (Michael 芳貴 Erlewine) 11:33 pm on August 18, 2010 Permalink | Reply
    Tags: meeting, ,   

    Had a great conversation with Joen, wher… 

    Had a great conversation with Joen, where walked through the process of experiment creation together. Here’s my notes from the meeting:

    • switch to single self-closing [ab /] shortcode
    • A/B shortcode feedback in visual editor
    • add A/B button

    – look at ratings plugin
    – test with TinyMCE advanced, etc.

    • variant viewer: close button
    • help / what’s this links, hover links
    • think about first run experience
    • complement “A/B testing”, or alternative wording for marketing
  • mitcho (Michael 芳貴 Erlewine) 12:55 pm on August 17, 2010 Permalink | Reply
    Tags: statistics   

    In coming up with the notion of “experiment completion” and creating the associated events, I’ve been reading a bit more about experiment sample size and strategies for its computation. In particular, I found the following post was both very inspiring and troubling: How Not To Run An A/B Test. The upshot is that simply waiting for getting a result that hits 95% confidence and ending the experiment then has the danger of resulting in invalid results. Computing a fixed sample size in advance is the solution.

    The problem with sample size

    The problem is that the regular way of computing an experiment sample size is based on power considerations, where power is the probability that a test will reject a false hypothesis, which in turn is based on how much variance there is in the goal metric being recorded. Suppose our goal metric is “number of dollars spent at the store”. Our control might look like this:

    CONTROL: $10, $11, $9.5, $10, $10, $10.5, $9, $5, $11, $11, $10.5…

    and our variant looks like this:

    VARIANT 1: $7, $6, $7, $8, $8.5, $5.5…

    Clearly, it looks very much like this variant is acting differently than the control… in particular, it seems to be performing about 20% (or more) worse. But what if the original control looked like the following?

    HYPOTHETICAL CONTROL: $8, $14, $12, $6, $7.5, $5, $16, $12, $18, $9, $11.5…

    This distribution has approximately the same average as the first control, but has much more variance. If we were asked whether we could confidently say that variant 1 above looks the same or different as they hypothetical control, we would be much more hesitant.

    This is why the variance of the goal metric is crucial in determining the confidence of a result and, thus, the required sample size to detect differences.

    The problem here is that, most of the time, we won’t know what the variance of the metric (measured in standard deviations) is prior to running the experiment. Some solutions, like Bayesian approaches, exist, but at the cost of greatly complicating our statistics algorithms, which I would prefer not to do.

    The question is, then, how do we compute a valid sample size ahead of time, or early on in the experiment period?

  • mitcho (Michael 芳貴 Erlewine) 8:14 pm on August 13, 2010 Permalink | Reply
    Tags: , , W3 Total Cache   

    Note: The latest trunk version of ShrimpTest now has support for W3 Total Cache (in “disk (basic)” mode) as well as WP Super Cache (in “half-on” mode), using the same single file.

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc