[WikiEN-l] Flagged protection and patrolled revisions

Thu May 6 23:38:13 UTC 2010

On Thu, May 6, 2010 at 5:22 PM, William Pietri <william at scissor.com> wrote:
>We discussed this at some length today, and I wanted to update everybody.

Who is the we in your message?  (I'm just asking because its entirely
ambiguous, since you were quoting me it looks like I was involved,
though I obviously wasn't :) )

[snip]
> We think the first part is a great idea, but not crucial for launch. So
> we've added to the backlog, and barring some sort of excitement that
> demands more immediate attention, we'll get to it soon after launch.

When is this launch planned to occur?

> For the second part, we're concerned. There are reasonable arguments on
> both sides. It's a solution to a problem that's currently only
> hypothetical, that anons will be put off if we're clear about what's
> actually going on. Even if there is some effect like that, we'd have to
> weigh that against our project's strong bias toward transparency. And if
> despite that we decided it was worth doing, we still think it's not
> crucial to do before launch.

I'm really saddened by the continued characterization of de-emphasis
as something which reduces transparency.  It has continually been my
position that the current behaviour actually reduces transparency by
being effectively dishonest on account of being an excessive
over-simplification.

The message currently delivered by the software is:
"Edits must be reviewed before being published on this page. "

And yet the edit will be instantly available to every person on earth
(well, at least all of the people who can currently access Wikipedia)
the moment it is saved.   The interface text is misleading. It is not
a good example of transparency at all.

If I knew how to fix it, I'd suggest alternative text.  Sadly, I
don't— I think that any message which conveys all of the most
important information will be too long and complicated for most people
to read.

What I was attempting to propose was being selective in which of many
possible misunderstandings the contributor was most likely to walk
away with, this isn't the same as proposing a reduction in
transparency.

> So given all that, we think it's better and wait to see if hypothetical
> problem becomes an actual problem; at that time we'll have better data
> for deciding how to handle the problem. As mentioned earlier, the
> Foundation is committed to supporting the experiment with further
> development as needed, so resources are already allocated if this
> becomes an issue.

What is the plan for collecting this data?  What metrics will be
collected?  What criteria will be considered a success / failure /
Problematic / non-problematic?

I have seen many tests invalidated the bias of having seen the
outcome: "Fantastic success! We have a customer retention rate of
60%!"  "What would have been failure?"  "um..." "Didn't you say last
week that you were expecting that we retained almost everyone?" "but!
60% is a _good_ number!"

I'm unsure how we could measure the effect of the interface here
outside of a controlled usability study.

I suppose we could take the set of new contributors group them by
their first edit being on a flagged-protected page or not, and then
for each group count how many of the users contributed a second time.
Then we would decide that the interface is not problematic if the two
groups are statistically indistinguishable, and that it may be
problematic if the first-flagged-protected users were less likely to
contribute a second time by at least some confidence level.

But this I think this would require non-trivial software
infrastructure to track client identities for anons which we currently
do not have. There are also many other methodological problems with
this approach, — e.g. Making the reasonable assumption that
flag-protection will largely replace semi-protection articles of
certain types (e.g. Bios) will be far more likely to be
flag-protected, we could reasonably expect that the characteristics of
these contributors to differ, that they'd be less or more likely to
contribute again, regardless of the flag-protection.

Or, in other words, making a reliable measurement of this may be a
project of compatible complexity to this whole deployment. ... which
is why my initial suggestion was to take what I believed to be the
most conservative approach in terms of editor impact—  simply because
the cost of actually measuring the result is likely too great in
comparison to the benefit of being less conservative.

I apologise for harping over these details, but for years the people
promoting a little review have dealt with opposition on the basis that
adding any friction at all may significantly discourage contribution,
and none of us want to slay our golden goose— the constant stream of
new contributors.

This is a concern which I've heard clearly expressed by Sue and Erik,
to name some of the most notable examples—  it's not merely an example
of David's "unstoppable petty complaint".

While I've personally dismissed this concern I've done so because I
consider getting a little review into the process to be _essential_
for the project's future and thus worth even considerable risk, not
because I believe it to be unfounded.

By my reasoning, we've only got one shot at this.  If the process
significantly dampens contributions I don't believe it will be
possible to convince the English Wikipedia community to tolerate
another attempt in the next several years, even if we think we know
the cause of the failure and think we can resolve it... since it will
be impossible to prove these things, and nothing short of proof or
public shame can reliably swing the community inertia.

Accordingly I want to make sure that we're making every reasonable
concession to ensure that the flagging does not discourage
contribution and that if it fails that it fails only because the
process of review is essentially incompatible with our development
model and not simply because some poor interface decisions doomed it
to failure.

Cheers,