Le 19/09/13 11:35, Petr Bena a écrit :
<snip>
Huggle 3 comes with vandalism-prediction as it is
precaching the diffs
even before they are enqueued including their contents. Each edit has
so called "score" which is a numerical value that if higher, the edit
is more likely a vandalism.
If you want to help us improve this feature, it is necessary to define
a "score words" list for every wiki where huggle is about to be used,
for example on English wiki.
Each list has following syntax:
(see
https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff…)
The good thing while reinventing the wheel, is that you can reuse
existing material :-]
Cluebot-NG has such a list:
http://review.cluebot.cluenet.org and its a
quite active one:
http://en.wikipedia.org/wiki/Special:Contributions/ClueBot_NG
It uses a variety of algorithms to determine the score of an edit:
http://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm
Maybe get in touch with them and reuse their engine?
--
Antoine "hashar" Musso