Totally unrelated to my previous email, I promise. This is just me writing down my thinking on how A/B testing works, and how it applies to the portal (www.wikipedia.org) experiments and the schema we have deployed there.
A/B testing is a common way of identifying if a proposed change to a piece of software is actually an improvement or not: it consists of taking a sample of users and dividing them into two groups, the "A" and "B" groups (hence the name). One group is consistently given the experimental change (the "test" group). One group is consistently given the default experience (the "control" group). Users are pseudorandomly sorted into each group, so that both groups are even. The end outcome for both groups is compared, and the change is successful if users in the test group are statistically significantly more likely to experience a better outcome than the users in the control group.
When we put together the schema for the Portal we did it after months of experimenting with the Cirrus A/B tests, which means that we tried to structure it to take into account the lessons we learned there. We discovered that things were simpler the more fields you had; that maintaining a base population who were not participating in any tests was ideal for dashboarding. Accordingly the schema tracks every KPI we care about for the portal and contains a "cohort" field that indicates if someone is in the "A" group, the "B" group, or no group whatsoever - with the idea that most users at any one time would be in /no/ group and we could rely on that population for dashboarding! That way we can handle everything with one schema.
So the things to remember when setting up Portal tests:
1. The test and control groups should be even; 2. The test and control group should (together) make up a very small chunk of the total people getting the logging. 10% combined, say. 3. The test and control group should both be represented with "cohort" values, with nothing (to produce a MySQL NULL) for the rest of the population.
That way we can both test and dashboard simultaneously.
Excellent summary. Please make sure this is on wiki as well.
Thanks
Kevin On Dec 10, 2015 8:05 AM, "Oliver Keyes" okeyes@wikimedia.org wrote:
Totally unrelated to my previous email, I promise. This is just me writing down my thinking on how A/B testing works, and how it applies to the portal (www.wikipedia.org) experiments and the schema we have deployed there.
A/B testing is a common way of identifying if a proposed change to a piece of software is actually an improvement or not: it consists of taking a sample of users and dividing them into two groups, the "A" and "B" groups (hence the name). One group is consistently given the experimental change (the "test" group). One group is consistently given the default experience (the "control" group). Users are pseudorandomly sorted into each group, so that both groups are even. The end outcome for both groups is compared, and the change is successful if users in the test group are statistically significantly more likely to experience a better outcome than the users in the control group.
When we put together the schema for the Portal we did it after months of experimenting with the Cirrus A/B tests, which means that we tried to structure it to take into account the lessons we learned there. We discovered that things were simpler the more fields you had; that maintaining a base population who were not participating in any tests was ideal for dashboarding. Accordingly the schema tracks every KPI we care about for the portal and contains a "cohort" field that indicates if someone is in the "A" group, the "B" group, or no group whatsoever
- with the idea that most users at any one time would be in /no/ group
and we could rely on that population for dashboarding! That way we can handle everything with one schema.
So the things to remember when setting up Portal tests:
- The test and control groups should be even;
- The test and control group should (together) make up a very small
chunk of the total people getting the logging. 10% combined, say. 3. The test and control group should both be represented with "cohort" values, with nothing (to produce a MySQL NULL) for the rest of the population.
That way we can both test and dashboard simultaneously.
-- Oliver Keyes Count Logula Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Yes! Thanks for putting this all in words. I'm really bad at putting things in writing so I appreciate this even more.
On Thursday, December 10, 2015, Kevin Smith ksmith@wikimedia.org wrote:
Excellent summary. Please make sure this is on wiki as well.
Thanks
Kevin On Dec 10, 2015 8:05 AM, "Oliver Keyes" <okeyes@wikimedia.org javascript:_e(%7B%7D,'cvml','okeyes@wikimedia.org');> wrote:
Totally unrelated to my previous email, I promise. This is just me writing down my thinking on how A/B testing works, and how it applies to the portal (www.wikipedia.org) experiments and the schema we have deployed there.
A/B testing is a common way of identifying if a proposed change to a piece of software is actually an improvement or not: it consists of taking a sample of users and dividing them into two groups, the "A" and "B" groups (hence the name). One group is consistently given the experimental change (the "test" group). One group is consistently given the default experience (the "control" group). Users are pseudorandomly sorted into each group, so that both groups are even. The end outcome for both groups is compared, and the change is successful if users in the test group are statistically significantly more likely to experience a better outcome than the users in the control group.
When we put together the schema for the Portal we did it after months of experimenting with the Cirrus A/B tests, which means that we tried to structure it to take into account the lessons we learned there. We discovered that things were simpler the more fields you had; that maintaining a base population who were not participating in any tests was ideal for dashboarding. Accordingly the schema tracks every KPI we care about for the portal and contains a "cohort" field that indicates if someone is in the "A" group, the "B" group, or no group whatsoever
- with the idea that most users at any one time would be in /no/ group
and we could rely on that population for dashboarding! That way we can handle everything with one schema.
So the things to remember when setting up Portal tests:
- The test and control groups should be even;
- The test and control group should (together) make up a very small
chunk of the total people getting the logging. 10% combined, say. 3. The test and control group should both be represented with "cohort" values, with nothing (to produce a MySQL NULL) for the rest of the population.
That way we can both test and dashboard simultaneously.
-- Oliver Keyes Count Logula Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','discovery@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/discovery