Hey,

This is the 21st weekly update from revision scoring team that we have sent to this mailing list.

New development
  • We received a request to get moving on Spanish Wikibooks support, so we dug in:
  • We deployed a new Wiki labels campaign[1]
  • We fixed an issue in Wiki labels that prevented requests from *.wikibooks.org[2]
  • We trained a basic "revert" detection model that seems to be pretty effective[3]
  • We also generated a dataset of article quality scores for English Wikipedia[4].  You can download it here: [5]

This week, we invested in some long term tasks.  If you review our phabricator board, you'll see substantial progress in improving our damage detection models with hashing vectorization strategies[6, 7], implementing a more robust model testing strategy[8], and implementing some advance natural language processing strategies[9, 10].  Stay tuned for the completion of these activities in the coming weeks.  

1. https://phabricator.wikimedia.org/T143962 -- Add uniqueness constraints to ores_classification
2. https://phabricator.wikimedia.org/T145406 -- Fix CORS for wikibooks
3. https://phabricator.wikimedia.org/T145428 -- Train/test reverted model for Spanish Wikibooks
4. https://phabricator.wikimedia.org/T135684 -- Generate recent article quality scores for English Wikipedia
6. https://phabricator.wikimedia.org/T128087 -- [Spike] Investigate HashingVectorizer
8. https://phabricator.wikimedia.org/T142953 -- Train on all data, Report test statistics on cross-validation
9. https://phabricator.wikimedia.org/T144636 -- Implement PCFG features

Sincerely, 
Aaron from the Revision Scoring team