Today I made my first commit to the ShrimpTest SVN repo.

Here’s a quick rundown of what it does:

  1. Create a number of tables (giving a PHP warning along the way on 3.0-trunk)
  2. When someone visits the site, check the user’s cookie (by default, called ebisen). If they don’t have one, make one via random md5 hash, and record the user agent + IP info. This internally ties them to a visitor_id.
  3. When the visitor goes to an experiment page—i.e., a page which manually calls shrimptest_get_variant($experiment_id) (within the theme or some other custom code)—it checks whether that visitor has been assigned to a random variant of the given experiment. If they haven’t been assigned yet, the tables for experiments and their treatments are consulted and the user is assigned to a random variant. A weighted random value algorithm is used so we can do not only 50%/50% splits, but 30%/70% splits, etc.
  4. When the visitor triggers a goal—i.e., makes some code manually call shrimptest_conversion_success($experiment_id)—it will record that success against that metric.

Only the “template tags” shrimptest_get_variant and shrimptest_conversion_success are public… everything else is in a ShrimpTest object of which there is one global instance, $shrimp. Right now it’s all just low-level functionality; no UI, and no easy way to setup experiments. Baby steps.

I now have this code running an A/A test (see my post from yesterday) on a webpage that gets a decent number of hits. For starters, running this for a while will help me verify that the random assignment is working and valid, which is one of the goals of A/A testing.

I can already see two problems from glancing at my visitors table: (a) the majority of these hits so far are bots, which I’ll have to filter, and (b) there are a number of repeat visitors who are not picking up the cookie (mostly bots, but a few humans). Right now if cookies are disabled, a new cookie hash and visitor ID is created for each one of those visits and they’ll each be counted multiply, greatly skewing the data.