On 7 May 2010 01:13, William Pietri william@scissor.com wrote:
On 05/06/2010 04:38 PM, Gregory Maxwell wrote:
On Thu, May 6, 2010 at 5:22 PM, William Pietriwilliam@scissor.com
wrote:
We discussed this at some length today, and I wanted to update
everybody.
Who is the we in your message? (I'm just asking because its entirely ambiguous, since you were quoting me it looks like I was involved, though I obviously wasn't :) )
Sorry, I mean we the project team. Howie Fung, Aaron Schulz, Rob Lanphier, me. We do a Skype call every Thursday.
We think the first part is a great idea, but not crucial for launch. So we've added to the backlog, and barring some sort of excitement that demands more immediate attention, we'll get to it soon after launch.
When is this launch planned to occur?
The most accurate answer: when it's ready. Qualitatively, soon. Those who want a more precise answer are welcome to extrapolate from the data here:
http://www.pivotaltracker.com/projects/46157
For the second part, we're concerned. There are reasonable arguments on both sides. It's a solution to a problem that's currently only hypothetical, that anons will be put off if we're clear about what's actually going on. Even if there is some effect like that, we'd have to weigh that against our project's strong bias toward transparency. And if despite that we decided it was worth doing, we still think it's not crucial to do before launch.
I'm really saddened by the continued characterization of de-emphasis as something which reduces transparency. It has continually been my position that the current behaviour actually reduces transparency by being effectively dishonest on account of being an excessive over-simplification.
The message currently delivered by the software is: "Edits must be reviewed before being published on this page."
And yet the edit will be instantly available to every person on earth (well, at least all of the people who can currently access Wikipedia) the moment it is saved. The interface text is misleading. It is not a good example of transparency at all.
We are entirely open to improvements to that text, or any other text. It's the best we've come up with so far, but we're under no illusions as to its perfection.
We do think it's important to indicate to people that if they browse back later, they will not see their change. It will also not be visible to the general public, including if they tell friends to take a look, and we want to be clear about that.
What I was attempting to propose was being selective in which of many possible misunderstandings the contributor was most likely to walk away with, this isn't the same as proposing a reduction in transparency.
Sorry if I misunderstood! I thought somebody was arguing for removing the message entirely and just showing them the latest draft as if it's the version that everybody sees, but I must have gotten that wrong.
So given all that, we think it's better and wait to see if
hypothetical
problem becomes an actual problem; at that time we'll have better
data
for deciding how to handle the problem. As mentioned earlier, the Foundation is committed to supporting the experiment with further development as needed, so resources are already allocated if this becomes an issue.
What is the plan for collecting this data? What metrics will be collected? What criteria will be considered a success / failure / Problematic / non-problematic?
As I mentioned elsewhere in the thread, I would be tickled pink if we could collect the kind of serious data I'm used to using in judging features like this. Our software and production environment do allow for that currently, and there are higher-priority infrastructure needs that must come first.
I would love for the usability team to user-test this, but they'll have to do it as part of a series testing the anonymous editing experience more broadly. I'm not sure what their schedule is, and for the moment it doesn't matter; if I were to delay launch for that people would rightly have my head on a pike. So until then, we'll have to use the low-grade plural-of-anecdotes sort of data that drive a lot of Wikipedia decisions.
Looking beyond the Foundation's resources, there's nothing stopping volunteers from user-testing this, and I would be over the moon if volunteer developers took up the cause of A/B testing. I expect it will take a while to get that in to a long-lived codebase like Mediawiki I'm sure there are plenty of smaller Mediawiki installs that could benefit from A/B testing even before we're ready to deploy something like that on the Wikipedia production cluster.
I hasten to add that we will be collecting other sorts of data. You can see that on labs here:
http://flaggedrevs.labs.wikimedia.org/wiki/Special:ValidationStatistics
We are also fortunate to have some time from the mighty Erik Zachte. Right now he's looking at dumps from the German use of FlaggedRevs to come up with useful stats so that he'll be ready to look at the English Wikipedia usage during the experiment. We also expect that we'll be improving the validation statistics page based on feedback both from him and involved users.
William
Just thought I'd pitch in here and say thank you William for these extensive and well-reasoned answers. A couple of months ago the Flagged-revs issue was possibly the least visible WMF-funded activity, but now it must to be the most visible/transparent of them all. The time spent on mailing list replies is no-doubt cutting in to the time spent coding - but if this means that many of the concerns are worked out and much of the pent-up steam has already been vented by the time the software goes live - then this could be a most successful roll out! I'm glad to hear that Erik Zachte will be watching and measuring during the launch period as, like Gregory says, good data will be crucial to knowing if the software is actually achieving the desired outcome.
-Liam [[witty lama]]
wittylama.com/blog Peace, love & metadata