Has anyone come up with a formulae for the ratio between vandalism prevented by the edit filters and lost edits on Wiki?
I'm trying to work out whether the "true" editing level on the English Wikipedia and others which use the edit Filters has risen or fallen since the Edit Filters were introduced in 2009. It turns out that this is a complex question, partly because the way that vandals respond to the edit filter is different to the way they respond to reversion and warning on wiki, and of course the filters have steadily got more complex and effective over the last four years. Also different filters will have a different ratio between vandalism filtered out and vandalism (and vandalism reversion, warnings, AIV reports and block notices) "lost" from Wikipedia.
It would be great if we could simply assume that someone who in 2013 tries to vandalise five articles but is prevented by the edit filter would pre 2009 have made 5 vandalism edits that would have been responded to with five reversions, four warnings, one AIV report, one block message 0.01 barnstars and 0.3 archive edits on the AIV page. But it isn't that simple, not least because of the psychology, just as teenage vandals are probably less likely to persist if they discover that the person they are competing with is some grey haired pensioner; so in theory fighting against a computer that is faster than you is less satisfying than doing so against a fellow human.
So I was wondering if there are Wikipedia language versions that have not yet implemented edit filters, or where the filters have changed so little that there are long periods where any change in editing levels has not been caused by improvement in the filters.
This is of more than academic interest, if we simply ignore this effect and make decisions based on the remaining raw edits after the edit filter, then the more efficient the edit filter gets at preventing vandalism the more we would be beating ourselves up for losing edits.
Regards
Jonathan Cardy
On 28/08/2013, WereSpielChequers werespielchequers@gmail.com wrote:
Has anyone come up with a formulae for the ratio between vandalism prevented by the edit filters and lost edits on Wiki?
...
Regards
Hi WSC,
Could you link to where there is a definition of what the edit filters are and what they are supposed to do? I recall having problems including urls like youtube, but I'm not sure if that blacklist is the same thing. If this was something only implemented on the English Wikipedia project, it might be more relevant to raise on wikien-l.
Cheers, Fae
The question can't really be answered without knowing what you want to achieve; I'll start from the end.
WereSpielChequers, 28/08/2013 14:13:
This is of more than academic interest, if we simply ignore this effect and make decisions based on the remaining raw edits after the edit filter, then the more efficient the edit filter gets at preventing vandalism the more we would be beating ourselves up for losing edits.
Usually we consider the number of active users, which is less affected by this. Editing activity should be measured using http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm which allows to check for unreverted edits (just updated by Erik after a few years it had been dormant).
If your aim is measuring the impact AbuseFilter in reducing patrolling efforts, then it's another matter. I've requested some reports in https://bugzilla.wikimedia.org/show_bug.cgi?id=42359 : there are already some DB queries but we lack a visualisation. You can also use https://meta.wikimedia.org/wiki/Abuse_filter to find what wikis used (or not) the abuse filter and how, before it was enabled by default on all wikis.
Nemo
Thanks Nemo,
Just because the edit filter is enabled by default doesn't mean that every wiki has people optimising it to find vandalism in their language.
I'm trying to work out what the underlying "real" level of editing has been since 2009. The problem with measuring either unreverted edits or edits by active users is that the edit filters don't just lose us a large proportion of the vandalism that we used to get, they also lose us a lot of goodfaith edits that have ceased to be necessary, including the vandalism reversions, warnings and block messages that have been automated away by the edit filter.
The stats at http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm get round part of that by only measuring mainspace edits, so they don't count the warnings and block messages that we've lost. Though they presumably have lost the reversion of vandalism that has now been prevented by the edit filter. But measuring article space edits has its own problems - the more article creation has shifted to sandboxes in userspace and especially to on EN wiki to WP space as part of Articles for creation, https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation the less meaningful it is to measure the different spaces as if their boundaries were immutable.
I appreciate that some of these things are difficult to measure, but sometimes it is the difficult stuff that is important. A case in point being the increasing tendency to revert unsourced edits on EN Wiki. The stats you quote treat all reversions the same, so the rise in simply reverting unsourced edits would appear to be more than masked by a combination of the loss of vandalism reversions to the edit filter, and the inreasing speed and sophistication of the vandalfighting bots.
Regards
Jonathan
On 28 August 2013 13:49, Federico Leva (Nemo) nemowiki@gmail.com wrote:
The question can't really be answered without knowing what you want to achieve; I'll start from the end.
WereSpielChequers, 28/08/2013 14:13:
This is of more than academic interest, if we simply ignore this effect
and make decisions based on the remaining raw edits after the edit filter, then the more efficient the edit filter gets at preventing vandalism the more we would be beating ourselves up for losing edits.
Usually we consider the number of active users, which is less affected by this. Editing activity should be measured using http://stats.wikimedia.org/EN/**PlotsPngEditHistoryAll.htmhttp://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htmwhich allows to check for unreverted edits (just updated by Erik after a few years it had been dormant).
If your aim is measuring the impact AbuseFilter in reducing patrolling efforts, then it's another matter. I've requested some reports in https://bugzilla.wikimedia.**org/show_bug.cgi?id=42359https://bugzilla.wikimedia.org/show_bug.cgi?id=42359: there are already some DB queries but we lack a visualisation. You can also use https://meta.wikimedia.org/**wiki/Abuse_filterhttps://meta.wikimedia.org/wiki/Abuse_filterto find what wikis used (or not) the abuse filter and how, before it was enabled by default on all wikis.
Nemo
WereSpielChequers, 28/08/2013 17:14:
Just because the edit filter is enabled by default doesn't mean that every wiki has people optimising it to find vandalism in their language.
This is what the bugzilla link is about. :)
I'm trying to work out what the underlying "real" level of editing has been since 2009.
For what purposes? The following sentence seems to be about something else:
The problem with measuring either unreverted edits or edits by active users is that the edit filters don't just lose us a large proportion of the vandalism that we used to get, they also lose us a lot of goodfaith edits that have ceased to be necessary, including the vandalism reversions, warnings and block messages that have been automated away by the edit filter.
The stats at http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm get round part of that by only measuring mainspace edits, so they don't count the warnings and block messages that we've lost. Though they presumably have lost the reversion of vandalism that has now been prevented by the edit filter.
That's fine if we're interested in the editing activity considered as a good thing (rather than in "how much time is wasted doing X").
But measuring article space edits has its own problems - the more article creation has shifted to sandboxes in userspace and especially to on EN wiki to WP space as part of Articles for creation, https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation the less meaningful it is to measure the different spaces as if their boundaries were immutable.
I don't understand. If a page is created in a namespace and moved to ns0, its whole history is counted. If history is not moved, or even worse it is not moved AND the creator is not the author of the content, something stinks. But why would people be doing something which is both wrong and more difficult?
I appreciate that some of these things are difficult to measure, but sometimes it is the difficult stuff that is important.
Yes but if it's important you need to define your goals or you'll never go anywhere.
A case in point being the increasing tendency to revert unsourced edits on EN Wiki. The stats you quote treat all reversions the same, so the rise in simply reverting unsourced edits would appear to be more than masked by a combination of the loss of vandalism reversions to the edit filter, and the inreasing speed and sophistication of the vandalfighting bots.
Again, I have no idea how this relates to all the above. Is measuring this specific thing your actual goal? You will never be able to see it in aggregated stats about editing activity, whatever filter or definition you use.
Nemo
wikimedia-l@lists.wikimedia.org