On Thu, May 31, 2012 at 11:08 AM, Carl (CBM) cbm.wikipedia@gmail.com wrote:
There is a redacted (no user info) table in the toolserver database that can be used to count the number of editors who watchlist a page. I fetched the counts for the 100 articles and found the median.
Ah. That's interesting to know and useful for context, thank you.
On Thu, May 31, 2012 at 11:59 AM, WereSpielChequers werespielchequers@gmail.com wrote:
Firstly rather than measure vandalism it created vandalism, and vandalism that didn't look like typical vandalism. Aside from the ethical issue involved, this will have skewed the result.
As I've said multiple times, this was a designed feature, and not a bug. The goal was not to measure the broadest possible kind of vandalism's reversion rate, as that has been amply studied*, but a specific kind. Complaining that this specific kind is 'skewed' compared to 'all possible vandalism related to external links' is to miss the point.
* Wikipedia generally does very well on *obvious* vandalism, and especially since the introduction of anti-vandalism bots with machine learning techniques. There's no need for anyone to spend time measuring it except perhaps bot-writers to finetune their statistics.
In particular the edit summaries were very atypical for vandalism, if I'd seen that edit summary on my watchlist I would probably have just sighed and taken it as another example of deletionism in action.
I propose a version of http://en.wikipedia.org/wiki/Poe%27s_law - it is impossible to create an example of deletionism mindless enough to be detectable as such if it comes with jargon attached.
Of the more than 13,000 pages on my watchlist I doubt there are 13 where I would look at such an edit, and that's if it was one of the changes on my watchlist that I was even aware of - it is far too big to fully check every day. Most IP vandals don't use jargon in edit summaries, and I know I'm not the only editor who is more suspicious of IP edits with blank edit summaries.
You only ran the experiment for one month. I often revert older vandalism than that, I may be unusual there in that I've got some tools for finding vandalism that has got past the hugglers, but I'm not unusual in sometimes taking articles back to the "last clean version".
You are unusual. When I was spending time reading academic publications on Wikipedia a few years ago, a number of them dealt with quantifying vandalism and reversions; almost all vandalism was reverted within days, and reversions which took longer than a month were very rare (0-10%, IIRC, to be very generous). This was why I chose to wait a month, because waiting longer added nothing. A week would have been adequate.
There are a number of related papers, but for brevity's sake take ftp://193.206.140.34/mirrors/epics-at-lnl/WikiDumps/localhost/group282-priedhorsky.pdf which found a exponential distribution for ordinary vandalism:
42% of damage incidents are repaired essentially immediately (i.e., within one estimated view). This result is roughly consistent with the work of Vi ́gas et al. [20], which showed that the median persistence of certain types of damage was 2.8 minutes. However, 11% of incidents persist beyond 100 views, 0.75% – 15,756 incidents – beyond 1000 views, and 0.06% – 1,260 incidents – beyond 10,000 views.
On average, the articles concerned had less than 100 page views a day going off stats.grok.se, so by just a few days, most of the edits should have been reverted - if they were going to be, of course. This sort of behavior is why you see such different averages and medians when you go looking in papers; eg
- ["Measuring Wikipedia"](http://eprints.rclis.org/bitstream/10760/6207/1/MeasuringWikipedia2005.pdf), Voss 2005 - ["Studying Cooperation and Conflict between Authors with history flow Visualizations"](http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf), Viégas et al 2003 - ["Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?"](http://repository.upenn.edu/cgi/viewcontent.cgi?article=1963&context=cis...), West 2010 - ["User Contribution and Trust in Wikipedia"](http://www.ics.uci.edu/~sjavanma/CollabCom), Javanmardi et al - ["He says, she says: conflict and coordination in Wikipedia"](http://nguyendangbinh.org/Proceedings/CHI/2007/docs/p453.pdf), Kittur et al 2007
On Thu, May 31, 2012 at 12:03 PM, Thomas Morton morton.thomas@googlemail.com wrote:
This, I think, is a major issue which make the results useless
- The edit summary implies policy knowledge, I'd only check an edit like
that on my watchlist on occasion.
And deletionists have no policy knowledge?