On 05/03/2010 06:13 PM, phoebe ayers wrote:
Is there a good usability-based way to do testing for these questions? (Has it been done, or discussed somewhere?) All I've got to go on is gut feelings one way or another.
Great question!
There are two broad sorts of testing typically done for things like this: qualitative and quantitative.
For the qualitative approach, you'll sit some representative user down and have them perform some tasks. You sit quietly and watch them carefully, often recording both the screen and their faces. Some people like to have them narrate their actions and thoughts, so you can get a better idea of how they're taking things. Advance versions even include eye-tracking, so you can get better insight into what they're looking at and reacting to. This can be done in person or remotely, and there are sites like usertesting.com to automate the process.
There are a number of quantitative approaches, but the most popular is A/B testing, where we'd deploy two different versions of the feature and assign a bunch of users to each group. Then we'd look at their behavior over time with respect to certain metrics, like returning to look at their articles, talk page participation, and long-term editing behavior.
Right now the WMF is doing a little of the first, and -- AFAIK -- none of the second. The qualitative testing is relatively expensive and time consuming to do. For a minor variation like this, qualitative testing is probably better done as part of a broader test of novice editing experiences.
I'd love to do more quantitative testing, as that would settle a lot of questions like this, but being set up for that would require non-trivial infrastructure changes, plus a serious discussion about how much cookie-based user tracking we want to do. Right now there's other infrastructure work that is definitely higher priority. E.g., making sure that a well-placed hurricane won't cause major site issues.
So for now, although I wish it were otherwise, we may have to settle for trying things out live and trying to calibrate our gut feelings based on what user reactions we can glean.
William