My vandalism analysis tool, which uses a simple but powerful methodology developed by Brian0918, analyses edit summaries on articles to spot probable vandalism reverts by recognising the summary patterns of standard rollbacks, and edits labelled "rvv", "rv v" or "rvc". It was developed for English Wikipedia but probably has applications beyond that, and the methods developed here have obvious utility beyond the recognition and reporting of vandalism.
You can visit it here:
http://tools.wikimedia.de/~tony_sidaway/
Please try to break it, and tell me what happened. There is a link to a discussion page for that purpose.
The rationale is that, while vandalism is difficult to recognise electronically, a pretty easy and reasonably reliable way to track vandalism on a popular wiki article is to examine edit summaries and count the proportion of them that indicate that the editors apparently believed themselves to be reverting vandalism.
A highly experimental adaptation of this script to recognise (only) rollbacks on the German Wikipedia is here:
http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi
The text of the latter CGI script is currently in English, although it is analyzing German text. As I know nothing about internationalization I have no idea whether it will always perform correctly if UTF-8 multibyte characters (such as o-umlaut) are entered.
This simple test seems to suggest that it does work:
http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi?article=K%C3...
Wikipedia is an international project and I welcome any and all testing input on this.
Presently I don't know of any edit summary patterns that non-administrators on the German Wikipedia use to indicate that they're reverting what we on English Wikipedia would recognise as simple vandalism--as I'm unfamiliar with their practises I'm not even certain that they draw the same distinctions that we do on English Wikipedia between intentional and overt disruptive edits (simple vandalism) and more subtle vandalism or trolling.
Any help on this that German speakers can offer would be most welcome.
Although I address the German Wikipedia prominently because its community is highly advanced and well organized, its content comparable to that of the English Wikipedia, and (not least) Deutsche Wikipedia hosts the tool server, I would also love to produce useful tools for as many languages as possible--the skills I learn can be put to use in tools of more general use than the current one. The scripts I write can easily be internationalized. I cannot write good German (whenever I try, native German speakers beg me to stop!) but I can write good French and reasonable Spanish. I am particularly interested in Chinese, Indian languages, and Russian.
It seems to find no vandalism earlier than February 2004 in any of the very high visibility articles I checked. Was there a change in Wikipedia custom and/or rollback functionality at that time, or is something going wrong with your counter?
In any case, good work.
SCZenz
On 12/24/05, Tony Sidaway f.crdfa@gmail.com wrote:
My vandalism analysis tool, which uses a simple but powerful methodology developed by Brian0918, analyses edit summaries on articles to spot probable vandalism reverts by recognising the summary patterns of standard rollbacks, and edits labelled "rvv", "rv v" or "rvc". It was developed for English Wikipedia but probably has applications beyond that, and the methods developed here have obvious utility beyond the recognition and reporting of vandalism.
You can visit it here:
http://tools.wikimedia.de/~tony_sidaway/
Please try to break it, and tell me what happened. There is a link to a discussion page for that purpose.
The rationale is that, while vandalism is difficult to recognise electronically, a pretty easy and reasonably reliable way to track vandalism on a popular wiki article is to examine edit summaries and count the proportion of them that indicate that the editors apparently believed themselves to be reverting vandalism.
A highly experimental adaptation of this script to recognise (only) rollbacks on the German Wikipedia is here:
http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi
The text of the latter CGI script is currently in English, although it is analyzing German text. As I know nothing about internationalization I have no idea whether it will always perform correctly if UTF-8 multibyte characters (such as o-umlaut) are entered.
This simple test seems to suggest that it does work:
http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi?article=K%C3...
Wikipedia is an international project and I welcome any and all testing input on this.
Presently I don't know of any edit summary patterns that non-administrators on the German Wikipedia use to indicate that they're reverting what we on English Wikipedia would recognise as simple vandalism--as I'm unfamiliar with their practises I'm not even certain that they draw the same distinctions that we do on English Wikipedia between intentional and overt disruptive edits (simple vandalism) and more subtle vandalism or trolling.
Any help on this that German speakers can offer would be most welcome.
Although I address the German Wikipedia prominently because its community is highly advanced and well organized, its content comparable to that of the English Wikipedia, and (not least) Deutsche Wikipedia hosts the tool server, I would also love to produce useful tools for as many languages as possible--the skills I learn can be put to use in tools of more general use than the current one. The scripts I write can easily be internationalized. I cannot write good German (whenever I try, native German speakers beg me to stop!) but I can write good French and reasonable Spanish. I am particularly interested in Chinese, Indian languages, and Russian. _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
On 12/25/05, SCZenz sczenz@gmail.com wrote:
It seems to find no vandalism earlier than February 2004 in any of the very high visibility articles I checked. Was there a change in Wikipedia custom and/or rollback functionality at that time, or is something going wrong with your counter?
In any case, good work.
Well spotted. It looks as if the rollback text changed that month from "Reverted to last edit by..." to "Reverted edits by...to last version by." I've added that pattern to the recogniser. Now it goes back as far as April, 2003. I suspect that the administrator rollback originated about that time, because even on one of our most vandalized articles has no recognisable rollbacks before that date. The editing rate was only one-hundredth of its current level at that date.