On Fri, Nov 15, 2013 at 1:24 PM, Jon Robson <jrobson(a)wikimedia.org> wrote:
If the A/B
test is limited to anonymous users on all pages, then I would expect
us to still be able to deduce whether minor changes to the UI
encourage clicking (in an audience if 30% of that has never clicked
the icon we would still see differences in click through rate in an
A/B test as 15% of those would be captured in the A/B test).
You can observe an increase or decrease, but the point is that it's
meaningless data, because there is no way to determine that what caused it
with any certainty. This means you can run a test and collect data, but you
can't answer a question like "Does this version make it easier to find the
menu, compared to the old version?"
Compare this to tests mobile has run on newly-registered users. While you
can't guarantee that they've all never made an account before, we know
through careful analysis that it's very likely that the vast majority of
new registrations are in fact new people. So when we do a random 50/50
split of new registrations, we're comparing the behavior of two similar
populations of users who have never been exposed to both treatments in an
A/B test.
With a random set of readers, you're getting a huge selection of users who
might be new, and also many users who have seen some permutation of the
site before. With a test like this, there's no way to ensure that a result
isn't just do to effects like random exploratory clicking because you
introduced something new to people who are used to the old icon. This is a
very common problem in A/B testing.
--
Steven Walling,
Product Manager
https://wikimediafoundation.org/