[Toolserver-l] A simple vandalism analysis tool

Tony Sidaway f.crdfa at gmail.com
Sun Dec 25 03:47:26 UTC 2005


My vandalism analysis tool, which uses a simple but powerful
methodology developed by Brian0918, analyses edit summaries on
articles to spot probable vandalism reverts by recognising the summary
patterns of standard rollbacks, and edits labelled "rvv", "rv v" or
"rvc".  It was developed for English Wikipedia but probably has
applications beyond that, and the methods developed here have obvious
utility beyond the recognition and reporting of vandalism.

You can visit it here:

http://tools.wikimedia.de/~tony_sidaway/

Please try to break it, and tell me what happened.  There is a link to
a discussion page for that purpose.

The rationale is that, while vandalism is difficult to recognise
electronically, a pretty easy and reasonably reliable way to track
vandalism on a popular wiki article is to examine  edit summaries and
count the proportion of them that indicate that the editors apparently
believed themselves to be reverting vandalism.

A highly experimental adaptation of this script to recognise (only)
rollbacks on the German Wikipedia is here:


http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi


The text of the latter CGI script is currently in English, although it
is analyzing German text.  As I know nothing about
internationalization I have no idea whether it will always perform
correctly if UTF-8 multibyte characters (such as o-umlaut) are
entered.

This simple test seems to suggest that it does work:

http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi?article=Köln

Wikipedia is an international project and I welcome any and all
testing input on this.

Presently I don't know of any edit summary patterns that
non-administrators on the German Wikipedia use to indicate that
they're reverting what we on English Wikipedia would recognise as
simple vandalism--as I'm unfamiliar with their practises I'm not even
certain that they draw the same distinctions that we do on English
Wikipedia between intentional and overt disruptive edits (simple
vandalism) and more subtle vandalism or trolling.

Any help on this that German speakers can offer would be most welcome.

Although I address the German Wikipedia prominently because its
community is highly advanced and well organized, its content
comparable to that of the English Wikipedia, and (not least) Deutsche
Wikipedia hosts the tool server, I would also love to produce useful
tools for as many languages as possible--the skills I learn can be put
to use in tools of more general use than the current one.  The scripts
I write can easily be internationalized.  I cannot write good German
(whenever I try, native German speakers beg me to stop!) but I can
write good French and reasonable Spanish.  I am particularly
interested in Chinese, Indian languages, and Russian.



More information about the Toolserver-l mailing list