[WikiEN-l] Link removal experiment; Re: "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

Gwern Branwen gwern0 at gmail.com
Thu May 31 21:28:14 UTC 2012


On Thu, May 31, 2012 at 11:08 AM, Carl (CBM) <cbm.wikipedia at gmail.com> wrote:
> There is a redacted (no user info) table in the toolserver database
> that can be used to count the number of editors who watchlist a page.
> I fetched the counts for the 100 articles and found the median.

Ah. That's interesting to know and useful for context, thank you.

On Thu, May 31, 2012 at 11:59 AM, WereSpielChequers
<werespielchequers at gmail.com> wrote:
> Firstly rather than measure vandalism it created vandalism, and vandalism
> that didn't look like typical vandalism. Aside from the ethical issue
> involved, this will have skewed the result.

As I've said multiple times, this was a designed feature, and not a
bug. The goal was not to measure the broadest possible kind of
vandalism's reversion rate, as that has been amply studied*, but a
specific kind. Complaining that this specific kind is 'skewed'
compared to 'all possible vandalism related to external links' is to
miss the point.

* Wikipedia generally does very well on *obvious* vandalism, and
especially since the introduction of anti-vandalism bots with machine
learning techniques. There's no need for anyone to spend time
measuring it except perhaps bot-writers to finetune their statistics.

> In particular the edit
> summaries were very atypical for vandalism, if I'd seen that edit summary
> on my watchlist I would probably have just sighed  and taken it as another
> example of deletionism in action.

I propose a version of http://en.wikipedia.org/wiki/Poe%27s_law - it
is impossible to create an example of deletionism mindless enough to
be detectable as such if it comes with jargon attached.

> Of the more than 13,000 pages on my
> watchlist I doubt there are 13 where I would look at such an edit, and
> that's if it was one of the changes on my watchlist that I was even aware
> of - it is far too big to fully check every day. Most IP vandals don't use
> jargon in edit summaries, and I know I'm not the only editor who is more
> suspicious of IP edits with blank edit summaries.
>
> You only ran the experiment for one month. I often revert older vandalism
> than that, I may be unusual there in that I've got some tools for finding
> vandalism that has got past the hugglers, but I'm not unusual in sometimes
> taking articles back to the "last clean version".

You are unusual. When I was spending time reading academic
publications on Wikipedia a few years ago, a number of them dealt with
quantifying vandalism and reversions; almost all vandalism was
reverted within days, and reversions which took longer than a month
were very rare (0-10%, IIRC, to be very generous). This was why I
chose to wait a month, because waiting longer added nothing. A week
would have been adequate.

There are a number of related papers, but for brevity's sake take
ftp://193.206.140.34/mirrors/epics-at-lnl/WikiDumps/localhost/group282-priedhorsky.pdf
which found a exponential distribution for ordinary vandalism:

> 42% of damage incidents are repaired essentially immediately (i.e., within one estimated view). This result is roughly consistent with the work of Vi ́gas et al. [20], which showed that the median persistence of certain types of damage was 2.8 minutes. However, 11% of incidents persist beyond 100 views, 0.75% – 15,756 incidents – beyond 1000 views, and 0.06% – 1,260 incidents – beyond 10,000 views.

On average, the articles concerned had less than 100 page views a day
going off stats.grok.se, so by just a few days, most of the edits
should have been reverted - if they were going to be, of course. This
sort of behavior is why you see such different averages and medians
when you go looking in papers; eg

    - ["Measuring
Wikipedia"](http://eprints.rclis.org/bitstream/10760/6207/1/MeasuringWikipedia2005.pdf),
Voss 2005
    - ["Studying Cooperation and Conflict between Authors with history
flow Visualizations"](http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf),
Viégas et al 2003
    - ["Detecting Wikipedia vandalism via spatio-temporal analysis of
revision metadata?"](http://repository.upenn.edu/cgi/viewcontent.cgi?article=1963&context=cis_reports),
West 2010
    - ["User Contribution and Trust in
Wikipedia"](http://www.ics.uci.edu/~sjavanma/CollabCom), Javanmardi et
al
    - ["He says, she says: conflict and coordination in
Wikipedia"](http://nguyendangbinh.org/Proceedings/CHI/2007/docs/p453.pdf),
Kittur et al 2007

On Thu, May 31, 2012 at 12:03 PM, Thomas Morton
<morton.thomas at googlemail.com> wrote:
> This, I think, is a major issue which make the results useless
>
> * The edit summary implies policy knowledge, I'd only check an edit like
> that on my watchlist on occasion.

And deletionists have no policy knowledge?

-- 
gwern
http://www.gwern.net



More information about the WikiEN-l mailing list