There was a discussion about unreverted vandalism on AN.
http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard#the_pro...
I often see unreverted vandalism that appears not to have been caught. The latest example is here:
http://en.wikipedia.org/w/index.php?title=Military_science&diff=33930922...
[I'm not at the right computer at the moment, so hopefully someone will fix that]
So is it as big a problem as it seems? What percentage of vandalism doesn't get caught for days or weeks?
Carcharoth
On 11 February 2010 15:48, Carcharoth carcharothwp@googlemail.com wrote:
The latest example is here:
http://en.wikipedia.org/w/index.php?title=Military_science&diff=33930922...
[I'm not at the right computer at the moment, so hopefully someone will fix that]
Fixed.
So is it as big a problem as it seems? What percentage of vandalism doesn't get caught for days or weeks?
http://en.wikipedia.org/wiki/User:Aetheling/Vandalism_survival
That's a pretty good study, albeit with a very small sample size (100 articles).
On Thu, Feb 11, 2010 at 3:54 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
On 11 February 2010 15:48, Carcharoth carcharothwp@googlemail.com wrote:
The latest example is here:
http://en.wikipedia.org/w/index.php?title=Military_science&diff=33930922...
[I'm not at the right computer at the moment, so hopefully someone will fix that]
Fixed.
Thanks.
So is it as big a problem as it seems? What percentage of vandalism doesn't get caught for days or weeks?
http://en.wikipedia.org/wiki/User:Aetheling/Vandalism_survival
That's a pretty good study, albeit with a very small sample size (100 articles).
"an estimated 10% of all vandalism endures for months and even years indicates that some new tools and strategies are needed for rooting out the most subtle and persistent forms of vandalism"
Quite a strong claim there.
The talk page discussion is interesting.
Carcharoth
The situation should improve if they *ever* enable flagged versions on the English wikipedia. At the moment detecting vandalism is a bit hit-and-miss; flagged versions should enable 100% checking.
That wouldn't completely stop vandalism, but it will greatly reduce it. This should be true even if we just use the flags as a technique to mark whether or not articles have been checked or not, rather than determining whether they should be seen.
On 11/02/2010, Carcharoth carcharothwp@googlemail.com wrote:
On Thu, Feb 11, 2010 at 3:54 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
On 11 February 2010 15:48, Carcharoth carcharothwp@googlemail.com wrote:
The latest example is here:
http://en.wikipedia.org/w/index.php?title=Military_science&diff=33930922...
[I'm not at the right computer at the moment, so hopefully someone will fix that]
Fixed.
Thanks.
So is it as big a problem as it seems? What percentage of vandalism doesn't get caught for days or weeks?
http://en.wikipedia.org/wiki/User:Aetheling/Vandalism_survival
That's a pretty good study, albeit with a very small sample size (100 articles).
"an estimated 10% of all vandalism endures for months and even years indicates that some new tools and strategies are needed for rooting out the most subtle and persistent forms of vandalism"
Quite a strong claim there.
The talk page discussion is interesting.
Carcharoth
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On 11 February 2010 15:48, Carcharoth carcharothwp@googlemail.com wrote:
So is it as big a problem as it seems? What percentage of vandalism doesn't get caught for days or weeks?
Well, here's a ballpark guess...
I've made 35,000 ish edits. I reckon maybe 10 or 20% of those are routine reversing other people's contributions. vandalism, etc. Call it 5,000.
I can think of perhaps a dozen cases where I've found very long-term or similarly complicated vandalism; assume I've forgotten half of them, or rolled-back without quite noticing the dates, and that implies about 0.5% of vandalism edits are something Particularly Remarkable.
Entirely unscientific, of course, but there you go.
The previous study relied on randomly sampling articles. Here's a couple of other possibilities:
a) Use vandalism *reports*. People tend to write in to the complaints address when they find vandalism irrespective of how long it's been there, or how complicated it is; if they were going to revert it, the odds are they wouldn't write. So, we could take a large sample of vandalism report emails (all archived in OTRS), identify their timestamp and the article being written about, find the relevant vandalism edit, find its reversion.
b) Use reversions. Sample a thousand uses of rollback from the recent changes list, find time between that edit and the one it was reverting.
The first of these overestimates vandalism to high-traffic pages, and so would *probably* lead to overcounting "young" vandalism - if we make the reasonably safe assumption that high traffic = high editor interest = many watchlists, etc.
The second of these falls down on undercounting old vandalism. Older vandalism tends to require conventional editing, rather than the use of rollback or undo, because there's higher odds the article's been edited since.
Any other ideas?
On Thu, Feb 11, 2010 at 5:05 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
b) Use reversions. Sample a thousand uses of rollback from the recent changes list, find time between that edit and the one it was reverting.
That one sounds easier. If only people wouldn't use rollback inappropriately...
Carcharoth
On 11 February 2010 17:17, Carcharoth carcharothwp@googlemail.com wrote:
On Thu, Feb 11, 2010 at 5:05 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
b) Use reversions. Sample a thousand uses of rollback from the recent changes list, find time between that edit and the one it was reverting.
That one sounds easier. If only people wouldn't use rollback inappropriately...
Looking for rollback edits is a good way to find vandalism that was reverted quickly, but as Andrew says it won't find old vandalism on articles with subsequent edits, which is essential if the intention it to find out how much vandalism takes a long time to be reverted.
On Thu, Feb 11, 2010 at 12:21 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
On 11 February 2010 17:17, Carcharoth carcharothwp@googlemail.com wrote:
On Thu, Feb 11, 2010 at 5:05 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
b) Use reversions. Sample a thousand uses of rollback from the recent changes list, find time between that edit and the one it was reverting.
That one sounds easier. If only people wouldn't use rollback inappropriately...
Looking for rollback edits is a good way to find vandalism that was reverted quickly, but as Andrew says it won't find old vandalism on articles with subsequent edits, which is essential if the intention it to find out how much vandalism takes a long time to be reverted.
And such are very common. In high-vandalism pages, it's easy for entire sections to just drop out in the back and forth. Bot edits badly exacerbate the issue because they edit whenever the heck they feel like it, and increase the noise in diffs.
An example: while looking at a reversion of a few anon edits on [[Legalism (Chinese philosophy)]], I grew suspicious of the ordering of sections - it seemed a little off, a little too choppy. I looked at consolidated diffs back to January, finding nothing in particular, but it was only when I gave it a last try all the way back to December, that I figured it out: 2 entire substantial sections had gotten deleted.
I had to manually copy them back in because of all the bot activity in the interim: https://secure.wikimedia.org/wikipedia/en/w/index.php?title=Legalism_%28Chin...
On Thu, Feb 11, 2010 at 5:46 PM, Gwern Branwen gwern0@gmail.com wrote:
<snip>
it was only when I gave it a last try all the way back to December, that I figured it out: 2 entire substantial sections had gotten deleted.
Goodness. That reminds me of the problem there used to be with unclosed ref tags leading to articles truncating on the screen at the point the closing ref tag was missing. The text was all there, just not displaying. I think that got fixed when someone tweaked Mediawiki to jump and down and produce flashing red warning lights when this happens.
Something like removal of entire section can be picked up by edit filters, but you still need people to check the filters and decide which edits are good and which are bad. I had an edit filter set up to detect the removal of "Category:Living people" from articles, but stopped following it after a few weeks when I realised that most of the edits were being reverted for other reasons before I had a chance to check (some way was needed to *flag* which edits had been dealt with or not).
Funny that, you know, flagging of edits. I first encountered a form of that on wikisource. Some form of flagged revisions might happen on en-Wikipedia some day as well, but it is quite a culture change to get used to. Hopefully when it happens, people will adapt quickly (ditto for LiquidThreads).
In fact, that has been my major worry about both Flagged Revisions and LiquidThreads. Will people get turned off by the new user interfaces if they don't like them? How do you implement major changes like this without breaking parts of what currently exist?
Carcharoth
On 11 February 2010 17:17, Carcharoth carcharothwp@googlemail.com wrote:
On Thu, Feb 11, 2010 at 5:05 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
b) Use reversions. Sample a thousand uses of rollback from the recent changes list, find time between that edit and the one it was reverting.
That one sounds easier. If only people wouldn't use rollback inappropriately...
Mmm. You'd want a second study to get an estimation of how much rollback is:
a) inappropriate - edit-warring; b) irrelevant ("rollback self" is not unknown...); c) legitimate but mundane, such as mass-reverting edits to clean up after a discussion;
and finally d) actually reverting vandalism.
(The same applies to (undo), but the proportion of d) would of course be vastly lower)
It's not quite a simple issue. I think we know vandalism is a "long tail" phenomenon, i.e. statistics of average reversion time get dominated by some very long-lasting bad edits. So for example median and mean reversion times may be very different. The question is whether one reads that as "soft security" actually breaking down, or as the comment that flagged doodads are a bit late to the party or will effect a big improvement. And then there is the issue of people not blanking their watchlists if they stop using them, which would conceal the effectively unwatched pages from the admins who would otherwise watchlist them. The latter problem could be addressed by database searches looking at watchlists of those who edit little.
Charles
On Thu, Feb 11, 2010 at 5:05 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Any other ideas?
One more: number of page views while in a vandalised state when that state is over a longer period (minimum for view stats would be to be in a vandalised state for a whole calendar day - better to look at article that were in a vandalised state for weeks or months).
The article I pointed to was in that state for nearly 3 weeks, and the vandalism was visible in the first word you try and read (unless people saw the template notice on stop and stopped reading):
http://en.wikipedia.org/w/index.php?title=Military_science&oldid=3393092...
From 23 January to 10 February, the page views were about 200 a day,
though the page view counter seems to have been broken for three of those days.
http://stats.grok.se/en/201002/Military_science
Still, that is 19 days and about 3800 page views where the vandalism was not detected. Strange.
Carcharoth