On Fri, Nov 15, 2013 at 1:24 PM, Jon Robson <jrobson@wikimedia.org> wrote:

If the A/B
test is limited to anonymous users on all pages, then I would expect
us to still be able to deduce whether minor changes to the UI
encourage clicking (in an audience if 30% of that has never clicked
the icon we would still see differences in click through rate in an
A/B test as 15% of those would be captured in the A/B test).

You can observe an increase or decrease, but the point is that it's meaningless data, because there is no way to determine that what caused it with any certainty. This means you can run a test and collect data, but you can't answer a question like "Does this version make it easier to find the menu, compared to the old version?"

Compare this to tests mobile has run on newly-registered users. While you can't guarantee that they've all never made an account before, we know through careful analysis that it's very likely that the vast majority of new registrations are in fact new people. So when we do a random 50/50 split of new registrations, we're comparing the behavior of two similar populations of users who have never been exposed to both treatments in an A/B test.

With a random set of readers, you're getting a huge selection of users who might be new, and also many users who have seen some permutation of the site before. With a test like this, there's no way to ensure that a result isn't just do to effects like random exploratory clicking because you introduced something new to people who are used to the old icon. This is a very common problem in A/B testing.

Steven Walling,

Product Manager

https://wikimediafoundation.org/