Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in Wikipedia. http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_20...
Abstract. We present results of a new approach to detect destructive article revi- sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class clas- sification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the In- formation Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83% precision at 77% recall with our model.* Compared to the rule-based methods that are currently applied* *in Wikipedia, our approach increases the F -Measure performance by 49% while* *being faster at the same time.*
Open the PDF, scan to page 667. This bot outperforms MartinBot, T-850 Robotic Assistant, WerdnaAntiVandalBot, Xenophon, ClueBot, CounterVandalismBot, PkgBot, MiszaBot, and AntiVandalBot. It outperforms the best of those (AntiVandalBot) by a very wide margin.
So why are you wasting the ISPs time and the police's time when the best of the passive technology routes have not been explored? Using machine learning *you pit the vandals against themselves. *Every time they perform a particular kind of vandalism, it can never be performed again because the bot will recognize it.
Cheers,
On Mon, Dec 29, 2008 at 4:15 PM, Brian Brian.Mingus@colorado.edu wrote:
By the way, I ask those questions having read the bots user page. It is apparently quite effective, indicating to me that this user causes minimal disruption.
On Mon, Dec 29, 2008 at 4:11 PM, Brian Brian.Mingus@colorado.edu wrote:
What percentage of his page moves were not picked up automatically by a bot?
What percentage of this users vandalism is not picked up by a bot?
Why is the ISP responsible for what he dumps into Wikipedia, rather than Wikipedia, as it allows itself to be a dumping ground? The Viacom/Youtube lawsuit demonstrates that this is a legal grey area, thus, I see little ground on which to punish the entire ip range of the ISP.
Why are machine learning bots that are trained on previous vandalism in order to detect new vandalism not being used? They have been developed. Why is the Foundation not funding their further development?
I believe the direction of this thread has been all wrong.
Peace,
On Mon, Dec 29, 2008 at 4:07 PM, Soxred93 soxred93@gmail.com wrote:
The problem with that is that many articles we have would not be found in any dictionary.
X!
On Dec 29, 2008, at 6:02 PM [Dec 29, 2008 ], Ian Woollard wrote:
On 29/12/2008, Joe Szilagyi szilagyi@gmail.com wrote:
Allow blocking on a more granular level, if we know his ISP, and lock out moves and redirects for the whole damn ISPs, and specifically point the finger back in the block message: Blocked because of JarlaxleArtemis/Grawp with a nice shiny link to his long-term abuse page.
It probably wouldn't work because of proxies and people that would emulate/help him.
Still, ideas that would affect less people rather than more like that are almost certainly IMO the way to go; for example restricting the range of characters and checking that the move title consists of words in a dictionary before permitting non admins or users with a small number of edits to complete a move might be desirable.
- Joe
-- -Ian Woollard
We live in an imperfectly imperfect world. Life in a perfectly imperfect world would be much better.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
-- You have successfully failed!
-- You have successfully failed!