AI August 2016

ai@lists.wikimedia.org

7 participants
16 discussions

Fwd: [Commons-l] Programmatically categorizing media in the Commons with Machine Learning
by Pine W 03 Apr '17

03 Apr '17

Forwarding. Pine ---------- Forwarded message ---------- From: "Jordan Adler" <jmadler(a)google.com> Date: Aug 11, 2016 13:06 Subject: [Commons-l] Programmatically categorizing media in the Commons with Machine Learning To: "commons-l(a)wikimedia.org" <commons-l(a)lists.wikimedia.org> Cc: "Ray Sakai" <rsakai(a)reactive.co.jp>, "Ram Ramanathan" < ramramanathan(a)google.com>, "Kazunori Sato" <kazsato(a)google.com> Hey folks! A few months back a colleague of mine was looking for some unstructured images to analyze as part of a demo for the Google Cloud Vision API <https://cloud.google.com/blog/big-data/2016/05/explore-the-galaxy-of-images…>. Luckily, I knew just the place <https://commons.wikimedia.org/wiki/Category:Media_needing_categories>, and the resulting demo <http://vision-explorer.reactive.ai/>, built by Reactive Inc., is pretty awesome. It was shared on-stage by Jeff Dean during the keynote <https://www.youtube.com/watch?v=HgWHeT_OwHc&feature=youtu.be&t=2h1m19s> at GCP NEXT 2016. I wanted to quickly share the data from the programmatically identified images so it could be used to help categorize the media in the Commons. There's about 80,000 images worth of data: - map.txt <https://storage.googleapis.com/gcs-samples2-explorer/reprocess/map.txt> (5.9MB): A single text file mapping id to filename in a "id : filename" format, one per line - results.tar.gz <https://storage.googleapis.com/gcs-samples2-explorer/reprocess/results.tar.…> (29.6MB): a tgz'd directory of json files representing the output of the API <https://cloud.google.com/vision/reference/rest/v1/images/annotate#response-…>, in the format "${id}.jpg.json" We're making this data available under the CC0 license, and these links will likely be live for at least a few weeks. If you're interested in working with the Cloud Vision API to tag other images in the Commons, talk to the WMF Community Tech team. Thanks for your help! _______________________________________________ Commons-l mailing list Commons-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

2 1

ORES article scores
by Pine W 04 Sep '16

04 Sep '16

Hi artificial intelligence people, Are there baseline ORES scores available for articles? If so, I'm thinking that it would be interesting to have the option to add a template to an article's talk page that shows its ORES score as well as its ORES percentile rank on a particular wiki. Pine

2 12

Re: [AI] [Wikitech-l] Deployment of ORES review tool in Englis Wikipedia as a beta feature
by Aaron Halfaker 30 Aug '16

30 Aug '16

It seems that ORES can't tell the difference between these types of edits and a similar style that truly are damaging, so it flags them for human review. Right now, I'm working on implementing a strategy called Hashing vectorization[1] to get some more signal out of an edit. But I think this strategy will fail to capture the kinds of things that are OK or not OK with this type of edit. I think we really need to finish the implementation of probabilistic context free grammars (PCFG) that aetilley started work on. It turns out that a lot of the work that I'd done to get vectorization working will lend itself to PCFGs too. I have some hope there. In the meantime, we might have to suffer reviewing this type of false positive. Once we're ready to try some new strategies, it will be helpful to have a rich library of false-positives to compare against, so it'll be great if you can keep adding interesting examples to https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Misc… 1. https://en.wikipedia.org/wiki/Feature_hashing#Feature_vectorization_using_t… On Fri, Aug 26, 2016 at 4:02 PM, Ryan Kaldari <rkaldari(a)wikimedia.org> wrote: > FWIW, all of the ORES false positives that I've seen so far have been > anonymous users fixing single words, for example, correcting verb tense or > changing to a more specific word. ORES typically marks these as damaging > with a high confidence regardless of the substance of the change. > > On Wed, Aug 24, 2016 at 6:07 AM, Amir Ladsgroup <ladsgroup(a)gmail.com> > wrote: > >> I also want to add that you can change ores sensitivity in your >> preferences >> and add "We deliberately set the default threshold so low to capture all >> vandalism cases so false positives are expected unlike anti-vandalism bot >> that set the threshold so high to capture only vandalism cases (and don't >> have false positives)." >> >> Best >> >> On Wed, Aug 24, 2016 at 2:13 AM Aaron Halfaker <aaron.halfaker(a)gmail.com> >> wrote: >> >> > Thanks Luis! :) >> > >> > And I just finished setting up a new labeling campaign for English >> > Wikipedia. This data will help us train/test more accurate models. >> > >> > See https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality for >> > instructions on how to get started. >> > >> > -Aaron >> > >> > On Tue, Aug 23, 2016 at 4:05 PM, Luis Villa <luis(a)lu.is> wrote: >> > >> >> Thanks for the detailed explanation, Aaron. As always your work is a >> >> model in transparency for the rest of us :) >> >> >> >> >> >> On Tue, Aug 23, 2016 at 12:40 PM Aaron Halfaker < >> aaron.halfaker(a)gmail.com> >> >> wrote: >> >> >> >>> Hi Luis! Thanks for taking a look. >> >>> >> >>> First, I should say that false-positives should be expected. We're >> >>> working on better signaling in the UI so that you can differentiate >> the >> >>> edits that ORES is confident about and those that it isn't confident >> about >> >>> -- but are still worth your review. >> >>> >> >>> So, in order to avoid a bias feedback loop, we don't want to feed any >> >>> observations you made *using* ORES back into the model -- since ORES' >> >>> prediction itself could bias your assessment and we'd re-perpetuate >> that >> >>> bias. Still, we can use these misclassification reports to direct our >> >>> attention to problematic behaviors in the model. We use the Wiki >> Labels >> >>> system[1] to gather reviews of random samples of edits from >> Wikipedians in >> >>> order to train the model. >> >>> >> >>> *Misclassification reports:* >> >>> See >> >>> https://meta.wikimedia.org/wiki/Research:Revision_scoring_ >> as_a_service/Misclassifications/Edit_quality >> >>> >> >>> We're still working out the Right(TM) way to report false positives. >> >>> Right now, we ask that you do so on-wiki and in the future, we'll be >> >>> exploring a nicer interface so that you can report them while using >> the >> >>> tool. We review these misclassification reports manually to focus >> our work >> >>> on the models and to report progress made. This data is never >> directly >> >>> used in training the machine learning models due to issues around >> bias. >> >>> >> >>> *Wiki labels campaigns:* >> >>> In order to avoid the biases in who gets reviewed and why, we generate >> >>> random samples of edits for review using our Wiki Labels[1] system. >> We've >> >>> completed a labeling campaign for English Wikipedia[2], but we could >> run an >> >>> additional campaign to gather more data. I'll get that set up and >> respond >> >>> to this message when it is ready. >> >>> >> >>> 1. https://meta.wikimedia.org/wiki/Wiki_labels >> >>> 2. https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality >> >>> >> >>> -Aaron >> >>> >> >>> On Tue, Aug 23, 2016 at 1:30 PM, Luis Villa <luis(a)lu.is> wrote: >> >>> >> >>>> Very cool! Is there any way for users of this tool to help train it? >> >>>> For example, the first four things it flagged in my watchlist were >> all >> >>>> false positives (next 5-6 were correctly flagged.) It'd be nice to >> be able >> >>>> to contribute to training the model somehow when we see these >> >>>> false-positives. >> >>>> >> >>>> On Tue, Aug 23, 2016 at 11:10 AM Amir Ladsgroup <ladsgroup(a)gmail.com >> > >> >>>> wrote: >> >>>> >> >>>>> We (The Revision Scoring Team >> >>>>> <https://meta.wikimedia.org/wiki/Research:Revision_scoring_ >> as_a_service#Team>) >> >>>>> are happy to announce the deployment of the ORES >> >>>>> <https://meta.wikimedia.org/wiki/ORES> review tool >> >>>>> <https://www.mediawiki.org/wiki/ORES_review_tool> as a beta feature >> >>>>> <https://en.wikipedia.org/wiki/Special:Preferences#mw-prefse >> ction-betafeatures> >> >>>>> on *English Wikipedia*. Once enabled, ORES highlights edits that >> are >> >>>>> likely to be damaging in Special:RecentChanges >> >>>>> <https://en.wikipedia.org/wiki/Special:RecentChanges>, >> >>>>> Special:Watchlist <https://en.wikipedia.org/wiki/Special:Watchlist> >> >>>>> and Special:Contributions >> >>>>> <https://en.wikipedia.org/wiki/Special:Contributions> to help you >> >>>>> prioritize your patrolling work. ORES detects damaging edits using a >> >>>>> basic prediction model based on past damage >> >>>>> <https://meta.wikimedia.org/wiki/Research:Automated_classifi >> cation_of_edit_quality>. >> >>>>> ORES is an experimental technology. We encourage you to take >> advantage of >> >>>>> it but also to be skeptical of the predictions made. It's a tool to >> support >> >>>>> you – it can't replace you. Please reach out to us with your >> questions and >> >>>>> concerns. >> >>>>> Documentationmw:ORES review tool >> >>>>> <https://www.mediawiki.org/wiki/ORES_review_tool>, >> mw:Extension:ORES >> >>>>> <https://www.mediawiki.org/wiki/Extension:ORES>, and m:ORES >> >>>>> <https://meta.wikimedia.org/wiki/ORES>Bugs & feature requests >> >>>>> https://phabricator.wikimedia.org/tag/revision-scoring-as-a- >> service-backlog/ >> >>>>> IRC#wikimedia-aiconnect >> >>>>> <http://webchat.freenode.net/?channels=#wikimedia-ai> >> >>>>> Sincerely,Amir from the Revision Scoring team >> >>>>> _______________________________________________ >> >>>>> AI mailing list >> >>>>> AI(a)lists.wikimedia.org >> >>>>> https://lists.wikimedia.org/mailman/listinfo/ai >> >>>>> >> >>>> >> >>>> _______________________________________________ >> >>>> AI mailing list >> >>>> AI(a)lists.wikimedia.org >> >>>> https://lists.wikimedia.org/mailman/listinfo/ai >> >>>> >> >>>> >> >>> _______________________________________________ >> >>> AI mailing list >> >>> AI(a)lists.wikimedia.org >> >>> https://lists.wikimedia.org/mailman/listinfo/ai >> >>> >> >> >> >> _______________________________________________ >> >> AI mailing list >> >> AI(a)lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/ai >> >> >> >> >> > _______________________________________________ >> > AI mailing list >> > AI(a)lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/ai >> > >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > >

2 1

The Revision Scoring weekly update
by Aaron Halfaker 29 Aug '16

29 Aug '16

Hey, This is the 19th weekly update from revision scoring team that we have sent to this mailing list. Deployments: - We deployed a set of new models to ORES that reduce our memory usage and slightly increase fitness. [1] These models were discussed in an email to the "ai" mailing list. [2] - We also completed a major quarterly goal. The ORES review tool is now deployed as a beta feature on 8 wikis! [3] This came with some quick fixes to fix some confusion and usability issues. [4] The beta feature is now available on English, Polish, Portuguese, Russian, Dutch, Persian and Turkish Wikipedias as well as Wikidata. New development: - We discussed and came to a rough consensus about how to integrate ORES into api.php. [5] - We deployed a new edit quality campaign on English Wikipedia to gather more data for training ORES. [6, 7] - We added a specific set of user groups to the ORES models for Turkish Wikipedia and saw an increase in model fitness. [8] Maintenance and robustness: - We fixed bugs in our maintenance scripts for purging old model versions [9, 10] - We switch to using our production models on the beta labs cluster so now we can catch vandalism there too (and know that the models actually work) [11] - We improved the error messages reported from Wiki Labels so that the actual error appears when the API responds with non-200 HTTP status [12] 1. https://phabricator.wikimedia.org/T144101 -- Deploy ORES at 2016-08-29 2. https://lists.wikimedia.org/pipermail/ai/2016-August/000068.html 3. https://phabricator.wikimedia.org/T140002 -- [Epic] Deploy ORES review tool 4. https://phabricator.wikimedia.org/T143988 -- $wgOresModels set all models true 5. https://phabricator.wikimedia.org/T122689 -- [Discuss] api.php integration with ORES 6. https://phabricator.wikimedia.org/T143745 -- Deploy 2016 edit quality campaign to English Wikipedia 7. https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality 8. https://phabricator.wikimedia.org/T140474 -- Include specific user groups in the trwiki edit quality model 9. https://phabricator.wikimedia.org/T144216 -- Purge model score should clean when there is no row is ores_model too 10. https://phabricator.wikimedia.org/T143798 -- Update model versions is badly broken in ORES extension 11. https://phabricator.wikimedia.org/T143567 -- Switch beta to use the proper wiki models for scoring (rather than "testwiki") 12. https://phabricator.wikimedia.org/T138255 -- Wikilabels UI reports non-200 status errors badly Sincerely, Aaron from the Revision Scoring team

1 0

New models coming to ORES & notes
by Aaron Halfaker 29 Aug '16

29 Aug '16

Hey folks, We've been working on generating some updated models for ORES. These models will behave slightly differently from the models that we currently have deployed. This is a natural artifact of retraining the models on the *exact same data* again because of some random properties of the learning algorithms. So, for the most part, this should be a non-issue for any tools that use ORES. However, I wanted to take this opportunity to highlight some of the facilities ORES provides to help automatically detect and adjust for these types of changes. *== Versions ==* ORES provides information about all of the models. This information includes a model version number. If you are caching ORES scores locally, we recommend invalidating old scores whenever this model number changes. For example, https://ores.wikimedia.org/v2/scores/enwiki/damaging/12345678 currently returns { "scores": { "enwiki": { "damaging": { "scores": { "12345678": { "prediction": false, "probability": { "false": 0.7141333465390294, "true": 0.28586665346097057 } } }, "version": "0.1.1" } } } } This score was generated with the "0.1.1" version of the model. But once we deploy the new models, the same request will return: { "scores": { "enwiki": { "damaging": { "scores": { "12345678": { "prediction": false, "probability": { "false": 0.8204647324045306, "true": 0.17953526759546945 } } }, "version": "0.1.2" } } } } Note that the version number changes to "0.1.2" and the probabilities change slightly. In this case, we're essentially re-training the same model in a similar way, so we increment the "patch" number. However, we're switching modeling strategies for the article quality models (enwiki-wp10, frwiki-wp10 & ruwiki-wp10), so those versions increment the minor version from "0.3.2" to "0.4.0". You may see more substantial changes in prediction probabilities with those models, but a quick spot-checking suggests that the changes are not substantial. *== Test statistics and threshholding ==* So, many tools that use our edit quality models (reverted, damaging and goodfaith) will set threshholds for flagging edits for review. In order to support these tools, we produce test statistics that suggest useful thresholds. https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info=test_stats produces: ... "filter_rate_at_recall(min_recall=0.75)": { "filter_rate": 0.869, "recall": 0.752, "threshold": 0.492 }, "filter_rate_at_recall(min_recall=0.9)": { "filter_rate": 0.753, "recall": 0.902, "threshold": 0.173 }, ... These two statistics show useful thresholds for detecting damaging edits. E.g. if you want to be sure that you catch nearly all vandalism (and are OK with a higher false-positive rate), set the threshold at 0.173, but if you'd like to catch most vandalism with almost no false-positives, set the threshold at 0.492. These fields can be read automatically by tools so that they do not need to be manually updated every time that we deploy a new model. Let me know if you have any questions and happy hacking! -Aaron

2 2

ORES review tool default sensitivity changed now
by Amir Ladsgroup 24 Aug '16

24 Aug '16

Hey all, We just deployed a change that changed default sensitivity of ORES review tool from "hard" to "soft" (meaning recall would drop from 0.9 to 0.75 but percentage of false positives drops too). You are still able to change it back in your preferences (Recent changes tab). Please come to us for any issues or questions. Best

1 0

Deployment of ORES review tool in Englis Wikipedia as a beta feature
by Amir Ladsgroup 24 Aug '16

24 Aug '16

We (The Revision Scoring Team <https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team>) are happy to announce the deployment of the ORES <https://meta.wikimedia.org/wiki/ORES> review tool <https://www.mediawiki.org/wiki/ORES_review_tool> as a beta feature <https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-betafeatur…> on *English Wikipedia*. Once enabled, ORES highlights edits that are likely to be damaging in Special:RecentChanges <https://en.wikipedia.org/wiki/Special:RecentChanges>, Special:Watchlist <https://en.wikipedia.org/wiki/Special:Watchlist> and Special:Contributions <https://en.wikipedia.org/wiki/Special:Contributions> to help you prioritize your patrolling work. ORES detects damaging edits using a basic prediction model based on past damage <https://meta.wikimedia.org/wiki/Research:Automated_classification_of_edit_q…>. ORES is an experimental technology. We encourage you to take advantage of it but also to be skeptical of the predictions made. It's a tool to support you – it can't replace you. Please reach out to us with your questions and concerns. Documentationmw:ORES review tool <https://www.mediawiki.org/wiki/ORES_review_tool>, mw:Extension:ORES <https://www.mediawiki.org/wiki/Extension:ORES>, and m:ORES <https://meta.wikimedia.org/wiki/ORES>Bugs & feature requests https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service-backlog/ IRC#wikimedia-aiconnect <http://webchat.freenode.net/?channels=#wikimedia-ai> Sincerely,Amir from the Revision Scoring team

3 5

The Revision Scoring weekly update
by Aaron Halfaker 22 Aug '16

22 Aug '16

Hey, This is the 18th weekly update from revision scoring team that we have sent to this mailing list. *Communications:* - Aaron presented on how user-feedback has been helping us address some sneaky biases in ORES' models. [1, 2, 3] *New development:* - We included 'autoreview' and 'patroller' groups in Turkish wiki models to get a fitness boost. [4] - We added some basic uwsgi metrics to grafana[5] and added a response timing metric from Change Propagation so that we can track any performance issues. [6] *Maintenance and robustness:* - We increased the number of workers per node in production for a 66% increase in total capacity for ORES[7] - We updated all of our edit quality models with the new version of revscoring [8] and sent an email out to wikitech-l and ai-l about the implications for tool developers. [9] - We decided not to make specialized models for ORES in beta labs. [10] Instead, we'll use the production models so that issues with them will be caught in beta. 1. https://phabricator.wikimedia.org/T143275 -- Present on user-feedback stories at Research Showcase 2. https://www.youtube.com/watch?v=rsFmqYxtt9w#t=29m00s -- Video of ORES user-feedback talk 3. https://www.mediawiki.org/wiki/File:Deploying_and_maintaining_AI_in_a_socio… 4. https://phabricator.wikimedia.org/T140474 -- Include specific user groups in the trwiki edit quality model 5. https://phabricator.wikimedia.org/T143081 -- Add uwsgi-related metrics to grafana 6. https://phabricator.wikimedia.org/T143568 -- Add median, 75% and 95% response time to ORES dashboard 7. https://phabricator.wikimedia.org/T143105 -- Increase celery workers to 40 per scb node 8. https://phabricator.wikimedia.org/T143125 -- Update editquality models with new version of revscoring 9. https://lists.wikimedia.org/pipermail/ai/2016-August/000068.html -- "[AI] New models coming to ORES & notes" 10. https://phabricator.wikimedia.org/T141980 -- Should we make a model for ores in beta? Sincerely, Aaron from the Revision Scoring team

1 0

Fwd: [Wikimedia-l] Facebook CTO on strategy, Internet access, Wikipedia
by Pine W 19 Aug '16

19 Aug '16

Forwarding, since the subjects may be of interest to people on the Wikitech, AI, and Research lists. I'm unqualified to evaluate Damon's comments and the FB exec's comments about AI, so please refrain from shooting the messenger if these aren't helpful or interesting to those of you who do know enough about AI to make well-educated assessments. Regards, Pine ---------- Forwarded message ---------- From: "Damon Sicore" <damon(a)sicore.com> Date: Aug 18, 2016 21:35 Subject: [Wikimedia-l] Facebook CTO on strategy, Internet access, Wikipedia To: "Wikimedia Mailing List" <wikimedia-l(a)lists.wikimedia.org> Cc: Hi, I usually don't recommend these things, but this interview with Schrep [1] [2] is interesting and insightful. I recommend listening to it instead of reading. He discusses FB's ten year plan, AI, VR, Internet access for all, mentions Wikipedia several times, confirms their insatiable hunger for structured data, and reveals several details on their innovation approach. Trigger Warning: Corporate Speak Make no mistake, I've nothing but contempt and spite for Facebook, but having worked with Mike I also know he demonstrates formidable intellect and is a decent person. He's incredibly capable in building amazing teams and predicting (more like sniffing out) the future of tech. I watch his moves closely to stay sharp. He's right about how papers are coming out constantly which augment current AI tech in interesting new ways. I believe we're living in interesting times for computer science and mathematics--computational linguistics and probabilistic search in particular. A person can't read the CS and math papers fast enough in order to keep up with the innovation. A lot of it is trivial, sure, but some is quite startling in impact as they combine a few smaller things which seemed previously innocuous yet when used together they solve key problems. When looking into tech and strategy for WMF and the engineers it supports, I'd be very interested in the direction Facebook is going and the technologies they plan on investing in, so passing it along. Yours faithfully, Damon [1] http://www.metisstrategy.com/interview/mike-schroepfer/ [2] https://en.wikipedia.org/wiki/Mike_Schroepfer Damon Sicore 512 963 5126 https://damon.sicore.com 6E98 FBFB D192 D325 B85D D4FF FD2A 20ED DC1D 3975 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

1 0

The Revision Scoring weekly update
by Amir Ladsgroup 16 Aug '16

16 Aug '16

Hey, This is the 17th weekly update from revision scoring team that we have sent to this mailing list. New developments: - ORES review tool is now deployed in Polish Wikipedia [1] - ORES review tool shows "r" flag in user contributions too. This will be deployed this week [2] - We increased the number of workers which increases ORES capacity to generate scores [3] - Revscoring now supports Tamil language. [4] - Our precaching now produces new metrics like failure rate and scoring speed [5] Maintenance and robustness: - ORES extension now just marks jobs as failed instead of throwing exception and spamming in errors log when it can't score an edit. [6] - We changed the way our precaching works causing a huge reduction in CPU usage. [7] - Fixed issue with ORES review tool and flow board activity. [8] - There was some inconsistency in the API scheme in before and after the deployment. We fixed it with another deployment [9] - We made our deploy process more robust by switching our canary node in codfw for on in eqiad [10] - We migrated our wp10 models from Random Forest to Gradient Boosting. It gives us almost the same accuracy with a reduction in memory usage. [11] - We enabled uwsgi metrics for ores. It adds a lots of useful metrics such as average response time or active workers, etc. [12] 1. https://phabricator.wikimedia.org/T140005 2. https://phabricator.wikimedia.org/T132371 3. https://phabricator.wikimedia.org/T142361 4. https://phabricator.wikimedia.org/T134105 5. https://phabricator.wikimedia.org/T119341 6. https://phabricator.wikimedia.org/T141978 7. https://phabricator.wikimedia.org/T142360 8. https://phabricator.wikimedia.org/T142858 9. https://phabricator.wikimedia.org/T142857 10. https://phabricator.wikimedia.org/T142630 11. https://phabricator.wikimedia.org/T141603 12. https://phabricator.wikimedia.org/T141543 Sincerely, Amir from the Revision Scoring team

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

AI August 2016