Hi,
I've been working on a vandalism detection tool for Wikipedia and I am currently looking for a list of spam words. Basically, I am looking for a list of terms typically associated with vandalism or spam. Is anybody aware of such resource?
Thanks in advance for your comments.
Best regards, -- Sérgio Nunes
On Thu, Jul 1, 2010 at 12:03 PM, S. Nunes snunes@gmail.com wrote:
I've been working on a vandalism detection tool for Wikipedia and I am currently looking for a list of spam words. Basically, I am looking for a list of terms typically associated with vandalism or spam. Is anybody aware of such resource?
One place to look would be the ClueBot source, both for a list of words as well as some heuristics they use to battle certain typical vandalism cases: http://en.wikipedia.org/wiki/User:ClueBot/Source
Cheers, Morten
You might try some of the rules they use for identifying email spam in SpamAssassin.
-Eric
On Jul 1, 2010, at 12:28 PM, Morten Wang wrote:
On Thu, Jul 1, 2010 at 12:03 PM, S. Nunes snunes@gmail.com wrote:
I've been working on a vandalism detection tool for Wikipedia and I am currently looking for a list of spam words. Basically, I am looking for a list of terms typically associated with vandalism or spam. Is anybody aware of such resource?
One place to look would be the ClueBot source, both for a list of words as well as some heuristics they use to battle certain typical vandalism cases: http://en.wikipedia.org/wiki/User:ClueBot/Source
Cheers, Morten
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org