Hey folks!
This is the 32 - 41st weekly update from the revision scoring team that we have sent to this mailing list. We've been busy, but our reporting fell behind. So here I am getting us caught up! This is going to be a long one. Bear with me.
One major thing we've done in the past few weeks is drafted and presented a proposal to increase the resourcing for the ORES project in the 2017 Fiscal Year. Currently, we're just one fully funded staff member (halfak) and partially funded contractor (Amir1) working with a bunch of volunteers. We're proposing to staff the team with fulltime engineers, a liaison and a tech writer. See a full draft of our proposal and pitch deck here: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Scoring_Platform_team
*New development:*
We've expanded support for our "editquality" models to more wikis and improved the performance of some of the models.
- We scaled up the number of observations for Indonesian Wikipedia to 100k[1]
- We added language support for Romanian[2] and built the basic "reverted" model[3]
- We trained and tested "damaging" and "goodfaith" models for Czech Wikipedia[4]
- We implemented some params in our training utilites to control memory usage[5]
- We deployed all of the above to Wikimedia Labs[6]. A production deployment is coming soon.
Prompted by the 2016 community wishlist[7], we've implemented a "draftquality" model for evaluating new page creations.
- We researched deletion reasons on English Wikipedia[8] and created a labeled dataset using the deletion log.
- We engineered a set of features to predict the quality of new articles[9] and built a model[10]
- We generated a set of datasets[11,12,13] to make it easier for volunteers and external researchers to help us audit the performance of the model.
- We deployed the model on WMFLabs[14] and announced it's presence to a few interested patrollers in English Wikipedia
- We've started the process of deploying the model in production[15,16]
We completed a project exploring the use of advance natural-language processing strategies to extract new signal about vandalism, article quality and problematic new articles. Regretfully, memory issues prevent us from trivially putting this into production[17], so we're looking into alternative strategies[18].
- We implemented a strategy for extracting sentence from Wikitext[19]
- We built sentence banks for personal attacks[20, vandalism[21], spam[22], and Featured Articles[23].
- We built PCFG-based models[24] and analyzed their ability to differentiate[25]
We've been working with the Collaboration Team[26] on their Edit Review Improvments project[27]
- We defined and implemented a set of new precision-based test statistics that will inform thresholds used in their new user interface[28]
- But we also decided to continue to report recall-based test statistics as well[29]
Based on advice from engineers on the Collaboration Team, we've begun the process of converting Wiki labels[30] to a stand-alone tool in labs.
- We generalize the gadget interface so that it can handle all langauges/wikis[31]
- We implemented a means to auto-configure wikis based on the dbname[32,33] and that allowed us to simplify configuration[34]
- We also implemented some performance improvements with minification, bundling[35]
*Labeling:*
In the past few weeks, we've set up labeling campaigns for a few wikis.
- We deployed an edit types campaign for Catalan Wikipedia[36]
- We deployed an edit quality campagin for Chinese[37] and Romanian[38] Wikipedias
- We deployed a new type of campaign for English Wikipedia -- "discussion quality" asks editors to label talk posts as "toxic" or not[39]
*Maintenance and robustness:*
We've solved a large set of problems with logging issues, compatibility with wikibase, and we've made minor improvements to performance.
- We addressed a few bugs in the ORES Review Tool[40,44]
- We quieted some errors from our logging in ORES[41,45]
- We updated our code to work with a wikibase schema change[42]
- We fixed a language fallback pattern in Wiki labels[43]
- We set up monitoring on ORES database disk sizes[46]
- We fixed some issues with scap, phabricator's diffusion and other supporting systems so that we can continue deploying to beta labs[47]
- We split our assets repo so that we can let our WMFLabs deploy get ahead of the Production deployment[48]
- ORES can now minify its JSON responses[49]
- We identified a bug in flask-assets and worked around it in our local installation of Wiki labels[50]
*Communications and outreach:*
We had a big presence at the Wikimedia Developer summit, we've drafted a resourcing proposal, and we've made some announcements about upcoming plans for the ORES Review tool.
- We facilitated the "Artificial Intelligence to build and navigate content" track[51]
- We ran a session for building an AI wishlist[52] and captured notes about more than 20 new AI proposals on a new tag in phabricator[53]
- We also ran a session discussion the ethics and dangers of advanced algorithms mediating our processes[54]
- We helped facilitate a session about where to surface current AIs in Wikimedia Projects[55]
- We held a discussion with Legal about licensing labeled data that comes out of Wiki labels[56] and updated the interface to state the CC0 license clearly[57]
- We worked with the Reading Infrastructure team to analyze the consumption of "oresscores" through the MediaWiki API[58]
- We drafted a pitch for increasing the resources for our team[59]
- We worked with the Collaboration team to announce that they'll experimenting with a new RecentChanged filtering strategy in the ORES Review Tool[60,61]
1. https://phabricator.wikimedia.org/T147107 -- Scale up the number of observations for idwiki to 100k 2. https://phabricator.wikimedia.org/T152482 -- Add language support for Romanian 3. https://phabricator.wikimedia.org/T156504 -- Build reverted model for Romanian Wikipedia 4. https://phabricator.wikimedia.org/T156492 -- Train and test damaging/goodfaith models for Czech Wikipedia 5. https://phabricator.wikimedia.org/T156645 -- Add '--workers' param to cv_train utility 6. https://phabricator.wikimedia.org/T154856 -- Clean up dependencies and deploy newest ORES & Models in labs 7. https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Mo... 8. https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_q... 9. https://phabricator.wikimedia.org/T148580 -- Build feature set for draft quality model 10. https://phabricator.wikimedia.org/T148038 -- [Epic] Build draft quality model (spam, vandalism, attack, or OK) 11. https://phabricator.wikimedia.org/T148581 -- Extract features for deleted page (draft quality model) 12. https://phabricator.wikimedia.org/T156642 -- Generate scored dataset for 2016-08 - 2017-01 13. https://phabricator.wikimedia.org/T156643 -- Generate extracted features for 2016-08 - 2017-01 14. https://phabricator.wikimedia.org/T155576 -- Deploy draftquality models to WMFLabs 15. https://phabricator.wikimedia.org/T156835 -- Create package stuff for draftquality 16. https://phabricator.wikimedia.org/T157049 -- Create new repo: research-ores-draftquality 17. https://phabricator.wikimedia.org/T148867#2816566 -- Memory footprint is enormous! 18. https://phabricator.wikimedia.org/T155111 -- [Spike] Investigate use of Apertium LTtoolbox API in labs/production 19. https://phabricator.wikimedia.org/T148867 -- Implement sentences datascources 20. https://phabricator.wikimedia.org/T148035 -- Sentence bank for personal attacks 21. https://phabricator.wikimedia.org/T148034 -- Sentence bank for vandalism 22. https://phabricator.wikimedia.org/T148032 -- Sentence bank for spam 23. https://phabricator.wikimedia.org/T148033 -- Sentence bank for Featured Articles 24. https://phabricator.wikimedia.org/T148037 -- Generate PCFG sentence models 25. https://phabricator.wikimedia.org/T151819 -- Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. 26. https://www.mediawiki.org/wiki/Collaboration 27. https://www.mediawiki.org/wiki/Edit_Review_Improvements 28. https://phabricator.wikimedia.org/T151970 -- Implement new precision-based test stats for editquality models 29. https://phabricator.wikimedia.org/T156644 -- Restore recall-threshold-based metrics for editquality models. 30. https://meta.wikimedia.org/wiki/Wiki_labels 31. https://phabricator.wikimedia.org/T151120 -- Generalize standalone gadget interface 32. https://phabricator.wikimedia.org/T154433 -- Auto config wikilabels using dbnames 33. https://phabricator.wikimedia.org/T155439 -- Use module loader to load JS/CSS from wikis 34. https://phabricator.wikimedia.org/T154693 -- Remove host from wikilabels config -- infer from request 35. https://phabricator.wikimedia.org/T154122 -- Minification and bundling for wikilabels assets 36. https://phabricator.wikimedia.org/T152965 -- Deploy cawiki edit types campaign 37. https://phabricator.wikimedia.org/T152561 -- Deploy zhwiki edit quality campaign 38. https://phabricator.wikimedia.org/T156357 -- Deploy edit quality campaign for Romanian Wikipedia 39. https://phabricator.wikimedia.org/T156303 -- Deploy "Discussion quality" campaign in wikilabels 40. https://phabricator.wikimedia.org/T152542 -- Undefined method ORES\Hooks::getDamagingThreshold() 41. https://phabricator.wikimedia.org/T146681 -- Quiet TimeoutError in celery logging 42. https://phabricator.wikimedia.org/T154168 -- Quantity changes broke ORES 43. https://phabricator.wikimedia.org/T154897 -- Chinese translations are not being loaded 44. https://phabricator.wikimedia.org/T155500 -- Fatal exception of type "DBQueryError" on sorting ORES contributions 45. https://phabricator.wikimedia.org/T157078 -- ores logspam: Model contains an error 46. https://phabricator.wikimedia.org/T155482 -- Set up monitoring for ORES redis database 47. https://phabricator.wikimedia.org/T157135 -- Fix broken beta-labs deploy 48. https://phabricator.wikimedia.org/T154436 -- Split wheels repo into Prod/WMFLabs branches and maintain independence 49. https://phabricator.wikimedia.org/T155931 -- Minify json responses 50. https://phabricator.wikimedia.org/T154865 -- assets url return empty string 51. https://phabricator.wikimedia.org/T147708 -- Artificial Intelligence to build and navigate content 52. https://phabricator.wikimedia.org/T147710 -- What should an AI do you for you? Building an AI Wishlist. 53. https://phabricator.wikimedia.org/tag/artificial-intelligence/ 54. https://phabricator.wikimedia.org/T147929 -- Algorithmic dangers and transparency -- Best practices 55. https://phabricator.wikimedia.org/T148690 -- Where to surface AI in Wikimedia Projects 56. https://phabricator.wikimedia.org/T145024 -- Licensing of labeled data 57. https://phabricator.wikimedia.org/T156052 -- Add notice of CC0 status of Wikilabels data to UI & Docs 58. https://phabricator.wikimedia.org/T156273 -- Identify baseline api.php Action API consumption 59. https://phabricator.wikimedia.org/T157470 -- Draft proposal/pitch for ORES resourcing 60. https://phabricator.wikimedia.org/T150855 -- Gather assets for post about ORES review tool including ERI filters 61. https://phabricator.wikimedia.org/T150858 -- Post about ORES review tool including ERI filters
Sincerely, Aaron from the Revision Scoring Scoring Platform team
Hey folks!
I should really stop calling this a weekly update because it's getting a bit silly at this point. :) But if it were a weekly update, it would cover the weeks of 42 - 46.
*Highlights:*
- 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia (damaging & goodfaith)
- We estimated and agreed on funding for ORES servers in the next year with Operations
- We published a paper about vandalism detection in Wikidata and a blog post about the massive effect of some initiatives on coverage of Women Scientists in Wikipedia.
*New development:*
- We added recall-based threshold metrics to the new draftquality model which should help tool devs know what which new page creations to highlight for review[1]
- We added optional notices for ORES pages which will help us visually distinguish our experimental install in WMFlabs from the Prod install ( ores.wikimedia.org)[2]
- We added basic language support for Finish (Thanks 4shadoww)[3] and deployed a 'reverted' model[4]
- We lead a discussion in Wikidata about "item quality" that resulted in a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a Wikilabels form to capture the gist of it[7]
- We enabled the ORES Review Tool on Czech Wikipedia[8]
- We configured ChangeProp to use our new minified JSON output to save bandwidth[9]
- We extended the Estonian language assets (Thanks Cumbril)[10] and deployed the 'damaging' and 'goodfaith' models[11,12]
- We enabled a testing model for 'goodfaith' on the Beta Cluster to make it easier for the Collaboration team to run tests with their new filter interface[13]
- We created a new "precache" endpoint that will allow us to de-duplicate configuration with ChangeProp and handle all routing in ORES locally[14]
*Resourcing:*
- We completed a 2 year estimate of ORES resource needs and discussed funding (capital expendature) for ORES in the coming fiscal year[15]. This will allow us to continue to grow ORES both in number of models and in scoring capacity.
*Communications:*
- Amir improved the KDD paper based on review feedback[16] and got it published[17]
- We published a blob post about our measurements of WikiProject Women Scientists[18,19] -- "The Keilana Effect"
- Thanks to Cumbril's work, the Estonian labeling campaing was finished[20]
*Deployments:*
- In early February, we deployed a new set of translations to Wikilabels (specifcally targeting Romanian Wikipedia)[21]
- In mid-February, we deployed some fixes to ORES documentation and response formatting[22]
- In mid-March, we deployed 3 new scoring models and ORES notices[23]
*Maintenance and robustness:*
- We fixed a serious issue in the "mwoauth" library that Wikilabels depends on[24]
- We reduced the number of revisions per request that we could receive via api.php[25]
- We investigated a scap issue that broke ORES deployment[26]
- We fixed a minor issue with JSON minification behavior[27] and hard-coding of the location of ORES in the documentation[28]
- We improved performance of ORES filters on MediaWiki[29]
- We improved the language describing ORES behavior on Special:Contributions[30]
- We added a notice to the Wikipages that Dexbot maintains about its behavior[31]
- We added notices to ores.wmflabs.org about it's experimental nature[32]
- We fixed some issues with testing Finnish language assets[33]
- We fixed some styling issues that resulted from an upgrade of OOJS UI[34]
1. https://phabricator.wikimedia.org/T157454 -- Add recall based thresholds to draftquality model 2. https://phabricator.wikimedia.org/T150962 -- Add an optional notice to ORES main and ui pages 3. https://phabricator.wikimedia.org/T158587 -- Add language support for Finnish 4. https://phabricator.wikimedia.org/T160228 -- Train/test reverted model for fiwiki 5. https://phabricator.wikimedia.org/T157489 -- [Discuss] item quality in Wikidata 6. https://www.wikidata.org/wiki/Wikidata:Item_quality 7. https://phabricator.wikimedia.org/T155828 -- Design item_quality form for Wikidata 8. https://phabricator.wikimedia.org/T151611 -- Enable ORES Review Tool on Czech Wikipedia 9. https://phabricator.wikimedia.org/T157693 -- Use minified JSON format in ChangeProp 10. https://phabricator.wikimedia.org/T160193 -- Extend estonian language assets from Wiki page 11. https://phabricator.wikimedia.org/T159608 -- Train/test damaging/goodfaith models for etwiki 12. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 13. https://phabricator.wikimedia.org/T160467 -- Enable 'goodfaith' on testwiki on Beta Cluster 14. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 15. https://phabricator.wikimedia.org/T157222 -- Estimate ORES capex for FY2017-18 16. https://phabricator.wikimedia.org/T148443 -- Improve the KDD paper based on the review 17. https://arxiv.org/abs/1703.03861 18. https://phabricator.wikimedia.org/T160078 -- Blog post about wp10 measurements of Women Scientists 19. https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ 20. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 21. https://phabricator.wikimedia.org/T157580 -- Deploy Romanian translations for Wiki labels 22. https://phabricator.wikimedia.org/T157842 -- Prod deployment of ORES 23. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod (Mid-March) 24. https://phabricator.wikimedia.org/T157858 -- mwoauth is broken 25. https://phabricator.wikimedia.org/T157983 -- Reduce the number of revisions that can be requested in one batch 26. https://phabricator.wikimedia.org/T157623 -- Investigate failed ORES deployment 27. https://phabricator.wikimedia.org/T157721 -- Investigate default JSON minification behavior in production 28. https://phabricator.wikimedia.org/T157723 -- ORES swagger is hard-coded for wmflabs 29. https://phabricator.wikimedia.org/T152585 -- rcshow=oresreview is slow 30. https://phabricator.wikimedia.org/T158862 -- Fix message in Special:Contributions 31. https://phabricator.wikimedia.org/T158899 -- Add notice about Dexbot overwriting manual changes to our tracking table. 32. https://phabricator.wikimedia.org/T159055 -- Add a notice to ores-wmflabs-deploy about "experimental" nature 33. https://phabricator.wikimedia.org/T160192 -- Fix testing issues in finnish language assets 34. https://phabricator.wikimedia.org/T160258 -- Fix minor styling issues with OOJS-UI in wikilabels
Sincerely, Aaron from the Scoring Platform team