On Thu, May 6, 2010 at 5:22 PM, William Pietri william@scissor.com wrote:
We discussed this at some length today, and I wanted to update everybody.
Who is the we in your message? (I'm just asking because its entirely ambiguous, since you were quoting me it looks like I was involved, though I obviously wasn't :) )
[snip]
We think the first part is a great idea, but not crucial for launch. So we've added to the backlog, and barring some sort of excitement that demands more immediate attention, we'll get to it soon after launch.
When is this launch planned to occur?
For the second part, we're concerned. There are reasonable arguments on both sides. It's a solution to a problem that's currently only hypothetical, that anons will be put off if we're clear about what's actually going on. Even if there is some effect like that, we'd have to weigh that against our project's strong bias toward transparency. And if despite that we decided it was worth doing, we still think it's not crucial to do before launch.
I'm really saddened by the continued characterization of de-emphasis as something which reduces transparency. It has continually been my position that the current behaviour actually reduces transparency by being effectively dishonest on account of being an excessive over-simplification.
The message currently delivered by the software is: "Edits must be reviewed before being published on this page. "
And yet the edit will be instantly available to every person on earth (well, at least all of the people who can currently access Wikipedia) the moment it is saved. The interface text is misleading. It is not a good example of transparency at all.
If I knew how to fix it, I'd suggest alternative text. Sadly, I don't— I think that any message which conveys all of the most important information will be too long and complicated for most people to read.
What I was attempting to propose was being selective in which of many possible misunderstandings the contributor was most likely to walk away with, this isn't the same as proposing a reduction in transparency.
So given all that, we think it's better and wait to see if hypothetical problem becomes an actual problem; at that time we'll have better data for deciding how to handle the problem. As mentioned earlier, the Foundation is committed to supporting the experiment with further development as needed, so resources are already allocated if this becomes an issue.
What is the plan for collecting this data? What metrics will be collected? What criteria will be considered a success / failure / Problematic / non-problematic?
I have seen many tests invalidated the bias of having seen the outcome: "Fantastic success! We have a customer retention rate of 60%!" "What would have been failure?" "um..." "Didn't you say last week that you were expecting that we retained almost everyone?" "but! 60% is a _good_ number!"
I'm unsure how we could measure the effect of the interface here outside of a controlled usability study.
I suppose we could take the set of new contributors group them by their first edit being on a flagged-protected page or not, and then for each group count how many of the users contributed a second time. Then we would decide that the interface is not problematic if the two groups are statistically indistinguishable, and that it may be problematic if the first-flagged-protected users were less likely to contribute a second time by at least some confidence level.
But this I think this would require non-trivial software infrastructure to track client identities for anons which we currently do not have. There are also many other methodological problems with this approach, — e.g. Making the reasonable assumption that flag-protection will largely replace semi-protection articles of certain types (e.g. Bios) will be far more likely to be flag-protected, we could reasonably expect that the characteristics of these contributors to differ, that they'd be less or more likely to contribute again, regardless of the flag-protection.
Or, in other words, making a reliable measurement of this may be a project of compatible complexity to this whole deployment. ... which is why my initial suggestion was to take what I believed to be the most conservative approach in terms of editor impact— simply because the cost of actually measuring the result is likely too great in comparison to the benefit of being less conservative.
I apologise for harping over these details, but for years the people promoting a little review have dealt with opposition on the basis that adding any friction at all may significantly discourage contribution, and none of us want to slay our golden goose— the constant stream of new contributors.
This is a concern which I've heard clearly expressed by Sue and Erik, to name some of the most notable examples— it's not merely an example of David's "unstoppable petty complaint".
While I've personally dismissed this concern I've done so because I consider getting a little review into the process to be _essential_ for the project's future and thus worth even considerable risk, not because I believe it to be unfounded.
By my reasoning, we've only got one shot at this. If the process significantly dampens contributions I don't believe it will be possible to convince the English Wikipedia community to tolerate another attempt in the next several years, even if we think we know the cause of the failure and think we can resolve it... since it will be impossible to prove these things, and nothing short of proof or public shame can reliably swing the community inertia.
Accordingly I want to make sure that we're making every reasonable concession to ensure that the flagging does not discourage contribution and that if it fails that it fails only because the process of review is essentially incompatible with our development model and not simply because some poor interface decisions doomed it to failure.
Cheers,