Ultimately we need a system that integrates information from multiple sources, such as WikiTrust, AbuseFilter and the Wikipedia Editorial Team.
A general point - there is a *lot* of information contained in edits that AbuseFilter cannot practically characterize due to the complexity of language and the subtelty of certain types of abuse. A system with access to natural language features (and wikitext features) could theoretically detect them. My quality research group considered including features relating to the [[Thematic relation]]s found in an article (we have access to a thematic role parser) which could potentially be used to detect bad writing - indicative of the edit containing vandalism.
On Thu, Mar 19, 2009 at 3:17 PM, Delirium delirium@hackish.org wrote:
But if your training data is the output of the previous rule set, you aren't going to be able to *improve* on its performance without some additional information (or built-in inductive bias).
-Mark
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l