[Wikimedia-l] Multivariate Fundraising Tests (Re: compromise?)

Matthew Walker mwalker at wikimedia.org
Fri Dec 28 22:46:06 UTC 2012


James,

On Fri, Dec 28, 2012 at 2:11 PM, James Salsman <jsalsman at gmail.com> wrote:

> I mean as in the tests done May 16, September 20, and October 9
> reported at
> http://meta.wikimedia.org/wiki/Fundraising_2012/We_Need_A_Breakthrough
> without adjusting the best performing pull-down delivery combined
> banner/landing page from the beginning of this month
>

I obviously cannot speak for what Zack will end up doing but let's talk
shop for a moment on how this would be implemented.

The tests you indicated play banner, landing page impressions, and donation
amount against each other. It appears that everyone saw a collection of
random banners (ie: the test was not bucketed.) Are these the same
variables you want to test?

Regardless of the answer to the above; how do you propose we normalize our
tests across time of day, day of week, and day of month factors - we've
seen evidence that these all play a role. I don't know how many banner
variations we actually have to test but it's likely we won't be able to
test them all at the same time (In fact with the current weighting setup we
can only test 30 banners at a time). Do we just take each group as it
stands -- find the best performers in the group and then test the winners
against each other?

An additional considering is that we have four buckets to play with;
buckets are independent so we could potentially test 120 banners at a time
to four different groups. Presumably if we did this we would want a couple
of control banners in each to normalize with?

An additional something to consider is how long do we have to run these
tests to gain statistical significance? At least a day I'm guessing. Are we
going to account for banner fatigue at all? IE: show banners during only
the first 10 visits like we just did with this most recent campaign?

-- 
~Matt Walker


More information about the Wikimedia-l mailing list