On Thu, May 31, 2012 at 11:08 AM, Carl (CBM) <cbm.wikipedia(a)gmail.com> wrote:
There is a redacted (no user info) table in the
toolserver database
that can be used to count the number of editors who watchlist a page.
I fetched the counts for the 100 articles and found the median.
Ah. That's interesting to know and useful for context, thank you.
On Thu, May 31, 2012 at 11:59 AM, WereSpielChequers
<werespielchequers(a)gmail.com> wrote:
Firstly rather than measure vandalism it created
vandalism, and vandalism
that didn't look like typical vandalism. Aside from the ethical issue
involved, this will have skewed the result.
As I've said multiple times, this was a designed feature, and not a
bug. The goal was not to measure the broadest possible kind of
vandalism's reversion rate, as that has been amply studied*, but a
specific kind. Complaining that this specific kind is 'skewed'
compared to 'all possible vandalism related to external links' is to
miss the point.
* Wikipedia generally does very well on *obvious* vandalism, and
especially since the introduction of anti-vandalism bots with machine
learning techniques. There's no need for anyone to spend time
measuring it except perhaps bot-writers to finetune their statistics.
In particular the edit
summaries were very atypical for vandalism, if I'd seen that edit summary
on my watchlist I would probably have just sighed and taken it as another
example of deletionism in action.
I propose a version of
http://en.wikipedia.org/wiki/Poe%27s_law - it
is impossible to create an example of deletionism mindless enough to
be detectable as such if it comes with jargon attached.
Of the more than 13,000 pages on my
watchlist I doubt there are 13 where I would look at such an edit, and
that's if it was one of the changes on my watchlist that I was even aware
of - it is far too big to fully check every day. Most IP vandals don't use
jargon in edit summaries, and I know I'm not the only editor who is more
suspicious of IP edits with blank edit summaries.
You only ran the experiment for one month. I often revert older vandalism
than that, I may be unusual there in that I've got some tools for finding
vandalism that has got past the hugglers, but I'm not unusual in sometimes
taking articles back to the "last clean version".
You are unusual. When I was spending time reading academic
publications on Wikipedia a few years ago, a number of them dealt with
quantifying vandalism and reversions; almost all vandalism was
reverted within days, and reversions which took longer than a month
were very rare (0-10%, IIRC, to be very generous). This was why I
chose to wait a month, because waiting longer added nothing. A week
would have been adequate.
There are a number of related papers, but for brevity's sake take
ftp://193.206.140.34/mirrors/epics-at-lnl/WikiDumps/localhost/group282-priedhorsky.pdf
which found a exponential distribution for ordinary vandalism:
42% of damage incidents are repaired essentially
immediately (i.e., within one estimated view). This result is roughly consistent with the
work of Vi ́gas et al. [20], which showed that the median persistence of certain types of
damage was 2.8 minutes. However, 11% of incidents persist beyond 100 views, 0.75% – 15,756
incidents – beyond 1000 views, and 0.06% – 1,260 incidents – beyond 10,000 views.
On average, the articles concerned had less than 100 page views a day
going off stats.grok.se, so by just a few days, most of the edits
should have been reverted - if they were going to be, of course. This
sort of behavior is why you see such different averages and medians
when you go looking in papers; eg
- ["Measuring
Wikipedia"](http://eprints.rclis.org/bitstream/10760/6207/1/MeasuringW….pdf),
Voss 2005
- ["Studying Cooperation and Conflict between Authors with history
flow
Visualizations"](http://alumni.media.mit.edu/~fviegas/papers/history_f….pdf),
Viégas et al 2003
- ["Detecting Wikipedia vandalism via spatio-temporal analysis of
revision
metadata?"](http://repository.upenn.edu/cgi/viewcontent.cgi?article=19…_reports),
West 2010
- ["User Contribution and Trust in
Wikipedia"](http://www.ics.uci.edu/~sjavanma/CollabCom)bCom), Javanmardi et
al
- ["He says, she says: conflict and coordination in
Wikipedia"](http://nguyendangbinh.org/Proceedings/CHI/2007/docs/p453.p….pdf),
Kittur et al 2007
On Thu, May 31, 2012 at 12:03 PM, Thomas Morton
<morton.thomas(a)googlemail.com> wrote:
This, I think, is a major issue which make the results
useless
* The edit summary implies policy knowledge, I'd only check an edit like
that on my watchlist on occasion.
And deletionists have no policy knowledge?
--
gwern
http://www.gwern.net