"Damaging" and "goodfaith" models for Wikidata are now online - AI

1 May 2016

Hello,
TLDR: Vandalism detection model for Wikidata just got much more accurate.

Longer version:
ORES is designed to handle different types of classification. For example
one of under development classification types is "wikiclass" which
determines type of edits. If they are adding content, or fixing mistake,
etc.

The most mature classification of ORES is edit quality. Whether an edit is
vandalism or not. We usually have three models: "reverted" model. Training
data for this model is obtained automatically. We sample around 20K edits
(for Wikidata it was different) and we consider an edit as vandalism if
they are reverted within a certain time period after the edit (7 days for
Wikidata).
On the other hand, "damaging" and "goodfaith" models are more
accurate
because we sample about 20K edits. Prelabel edits that being made by
trusted users such as admins and bots as not harmful to Wikidata/Wikipedia
and then we ask users to label the rest. (For Wikidata it was around 4K
edits) Since most edits in Wikidata are made by bots and trusted users, We
altered this method for Wikidata a bit but the whole process is the same.
Don't forget that since it's human judgement, this models are more accurate
and useful to damage detection. The ORES extension uses "damaging" model
and not "reverted" model, thus having "damaging" model online is a
requirement for the extension deployment.
People label edits that if an edit is damaging to Wikidata and if the edit
is made by good intention. So we have three cases: 1- An edit is harmful to
Wikidata but made with good intention. An honest/newbie mistake 2- An edit
is harmful and made bad intention. A vandalism 3- A edit with good
intention and productive. A "good" edit".
Biggest reason to distinguish between honest mistakes and vandalisms is
that using anti-vandalism bots caused reducing on new user retention in
Wikis [1]. So future anti-vandalism bots should not revert good faith
mistakes but report them for human review.

One of good things about Wikidata damage detection labeling process is that
so many people were involved (we had 38 labelers for Wikidata[2]). Another
good thing is that its fitness very high in terms of AI [3]. But since
number of damaging edits and not damaging edits are not the same, scores it
gives to edits are not intuitive. Let me give you an example: In our
damaging model if an edit is scored less than 80% it's probably not
vandalism. Actually, in a very huge sampling of human edits we had for
reverted model we couldn't find a bad edit with score lower than 93% i.e.
If an edit is scored 92% in reverted model, you are pretty sure it's not
vandalism. Please reach out to us if you have any questions on using these
scores. Please reach out to us if have any questions in general ;)

In terms of needed changes, ScoredRevision gadget is set automatically to
prefer the damaging model. I just changed my bot in #wikidata-vandalism
channel in order to use damaging instead of reverted.

If you want to use these models. Check out our docs [4]

Sincerely,
Revision scoring team [5]

[1]: Halfaker, A.; Geiger, R. S.; Morgan, J. T.; Riedl, J. (28 December
2012). "The Rise and Decline of an Open Collaboration System: How
Wikipedia's Reaction to Popularity Is Causing Its Decline". *American
Behavioral Scientist* *57* (5): 664–688.
[2]: https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats
[3]: https://ores.wmflabs.org/scores/wikidatawiki/?model_info
[4]: https://ores.wmflabs.org/v2/
[5]:
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team