This all goes back to how you aim to quantify improvement in usability.
These samples sizes are so small that it will be hard (or even impossible)
to evaluate your progress based on statistical significance. You've got to
prove to us that its really getting better, and doesn't just look prettier.
On Thu, May 7, 2009 at 7:44 PM, Erik Moeller <erik(a)wikimedia.org> wrote:
2009/5/7 Brian <Brian.Mingus(a)colorado.edu>du>:
> Based on these criteria, the 2,500 users that responded to our survey
> filtered down to 500 viable subjects based on
their answers to these
> questions. The team, along with B|P, partnered with Davis Recruiting to
> contact, filter, and screen these 500 participants based on their
> contribution history, Wikipedia usage
patterns, their given reasons for
> contributing, and their talkativeness and
openness to discuss their
> and actions. From 2,500 users, we ended up
with 10 study participants
> 3-5 waitlisted participants.
You went from 2,500 subjects to just 10?
The purpose of a study like this is focused observation of the
behavior of individual human beings. As David has pointed out, for any
study like this there are laws of diminishing returns, and any serious
observation of an individual is time-consuming and costly (raw data is
worthless if you can't analyze it). That's why usability gurus like
Nielsen suggest "5 is enough" for most tests:
- due to our highly
diverse audience, we chose a larger group, and we split between remote
and lab testing to compensate for biases of both methods. This has
worked well to identify plenty of very obvious usability barriers to
There are alternative data collection methods such as large scale
quantitative testing where the level of individual engagement is
limited; those can give you behavioral patterns etc. They can be
useful, too, but are an entirely different thing.
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
foundation-l mailing list