Hey,
This is the 23rd weekly update from revision scoring team that we have sent
to this mailing list.
New development
- We implemented and demonstrated a linguistic/stylometric processing
strategy that should give us more signal for finding vandalism and
spam[1]. See the discussion on the AI list[2].
- As part of our support for the Collaboration Team, we've been
producing tables of model statistics that correspond to set of
thresholds[3]. This helps their designers work on strategies for reporting
prediction confidence in an intuitive way.
Maintenance and robustness
- We had a major downtime event that was caused by our logs being too
verbose. We've recovered and turned down the log level[4].
- We made sure that halfak got pings when
ores.wikimedia.org goes down[5]
Datasets
- We created a database on Wikimedia Labs that provides access to a
dataset containing a complete set of article quality predictions for
English Wikipedia[6]. See our announcements[7,8,9].
1.
https://phabricator.wikimedia.org/T146335 -- Implement a basic scoring
strategy for PCFGs
2.
https://lists.wikimedia.org/pipermail/ai/2016-September/000098.html
3.
https://phabricator.wikimedia.org/T146280 -- Produce tables of stats for
damaging and goodfaith models
4.
https://phabricator.wikimedia.org/T146581 -- celery log level is INFO
causing disruption on ORES service
5.
https://phabricator.wikimedia.org/T146720 -- Ensure that halfak gets
emails when
ores.wikimedia.org goes down
6.
https://phabricator.wikimedia.org/T106278 -- Setup a db on labsdb for
article quality that is publicly accessible
7.
https://phabricator.wikimedia.org/T146156 -- Announce article quality
database in labsdb
8.
https://lists.wikimedia.org/pipermail/ai/2016-September/000091.html
9.
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_14…
Sincerely,
Aaron from the Revision Scoring team