Hello,

TLDR: Vandalism detection model for Wikidata just got much more accurate.

Longer version:

ORES is designed to handle different types of classification. For example one of under development classification types is "wikiclass" which determines type of edits. If they are adding content, or fixing mistake, etc.

The most mature classification of ORES is edit quality. Whether an edit is vandalism or not. We usually have three models: "reverted" model. Training data for this model is obtained automatically. We sample around 20K edits (for Wikidata it was different) and we consider an edit as vandalism if they are reverted within a certain time period after the edit (7 days for Wikidata).

On the other hand, "damaging" and "goodfaith" models are more accurate because we sample about 20K edits. Prelabel edits that being made by trusted users such as admins and bots as not harmful to Wikidata/Wikipedia and then we ask users to label the rest. (For Wikidata it was around 4K edits) Since most edits in Wikidata are made by bots and trusted users, We altered this method for Wikidata a bit but the whole process is the same. Don't forget that since it's human judgement, this models are more accurate and useful to damage detection. The ORES extension uses "damaging" model and not "reverted" model, thus having "damaging" model online is a requirement for the extension deployment.

People label edits that if an edit is damaging to Wikidata and if the edit is made by good intention. So we have three cases: 1- An edit is harmful to Wikidata but made with good intention. An honest/newbie mistake 2- An edit is harmful and made bad intention. A vandalism 3- A edit with good intention and productive. A "good" edit".

Biggest reason to distinguish between honest mistakes and vandalisms is that using anti-vandalism bots caused reducing on new user retention in Wikis [1]. So future anti-vandalism bots should not revert good faith mistakes but report them for human review.

One of good things about Wikidata damage detection labeling process is that so many people were involved (we had 38 labelers for Wikidata[2]). Another good thing is that its fitness very high in terms of AI [3]. But since number of damaging edits and not damaging edits are not the same, scores it gives to edits are not intuitive. Let me give you an example: In our damaging model if an edit is scored less than 80% it's probably not vandalism. Actually, in a very huge sampling of human edits we had for reverted model we couldn't find a bad edit with score lower than 93% i.e. If an edit is scored 92% in reverted model, you are pretty sure it's not vandalism. Please reach out to us if you have any questions on using these scores. Please reach out to us if have any questions in general ;)

In terms of needed changes, ScoredRevision gadget is set automatically to prefer the damaging model. I just changed my bot in #wikidata-vandalism channel in order to use damaging instead of reverted.

If you want to use these models. Check out our docs [4]

Sincerely,

Revision scoring team [5]

[1]: Halfaker, A.; Geiger, R. S.; Morgan, J. T.; Riedl, J. (28 December 2012). "The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline". American Behavioral Scientist 57 (5): 664–688.

[2]: https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats

[3]: https://ores.wmflabs.org/scores/wikidatawiki/?model_info

[4]: https://ores.wmflabs.org/v2/

[5]: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team