Hi artificial intelligence people,
Are there baseline ORES scores available for articles? If so, I'm thinking
that it would be interesting to have the option to add a template to an
article's talk page that shows its ORES score as well as its ORES
percentile rank on a particular wiki.
Pine
It seems that ORES can't tell the difference between these types of edits
and a similar style that truly are damaging, so it flags them for human
review.
Right now, I'm working on implementing a strategy called Hashing
vectorization[1] to get some more signal out of an edit. But I think this
strategy will fail to capture the kinds of things that are OK or not OK
with this type of edit. I think we really need to finish the
implementation of probabilistic context free grammars (PCFG) that aetilley
started work on. It turns out that a lot of the work that I'd done to get
vectorization working will lend itself to PCFGs too. I have some hope
there. In the meantime, we might have to suffer reviewing this type of
false positive. Once we're ready to try some new strategies, it will be
helpful to have a rich library of false-positives to compare against, so
it'll be great if you can keep adding interesting examples to
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Misc…
1.
https://en.wikipedia.org/wiki/Feature_hashing#Feature_vectorization_using_t…
On Fri, Aug 26, 2016 at 4:02 PM, Ryan Kaldari <rkaldari(a)wikimedia.org>
wrote:
> FWIW, all of the ORES false positives that I've seen so far have been
> anonymous users fixing single words, for example, correcting verb tense or
> changing to a more specific word. ORES typically marks these as damaging
> with a high confidence regardless of the substance of the change.
>
> On Wed, Aug 24, 2016 at 6:07 AM, Amir Ladsgroup <ladsgroup(a)gmail.com>
> wrote:
>
>> I also want to add that you can change ores sensitivity in your
>> preferences
>> and add "We deliberately set the default threshold so low to capture all
>> vandalism cases so false positives are expected unlike anti-vandalism bot
>> that set the threshold so high to capture only vandalism cases (and don't
>> have false positives)."
>>
>> Best
>>
>> On Wed, Aug 24, 2016 at 2:13 AM Aaron Halfaker <aaron.halfaker(a)gmail.com>
>> wrote:
>>
>> > Thanks Luis! :)
>> >
>> > And I just finished setting up a new labeling campaign for English
>> > Wikipedia. This data will help us train/test more accurate models.
>> >
>> > See https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality for
>> > instructions on how to get started.
>> >
>> > -Aaron
>> >
>> > On Tue, Aug 23, 2016 at 4:05 PM, Luis Villa <luis(a)lu.is> wrote:
>> >
>> >> Thanks for the detailed explanation, Aaron. As always your work is a
>> >> model in transparency for the rest of us :)
>> >>
>> >>
>> >> On Tue, Aug 23, 2016 at 12:40 PM Aaron Halfaker <
>> aaron.halfaker(a)gmail.com>
>> >> wrote:
>> >>
>> >>> Hi Luis! Thanks for taking a look.
>> >>>
>> >>> First, I should say that false-positives should be expected. We're
>> >>> working on better signaling in the UI so that you can differentiate
>> the
>> >>> edits that ORES is confident about and those that it isn't confident
>> about
>> >>> -- but are still worth your review.
>> >>>
>> >>> So, in order to avoid a bias feedback loop, we don't want to feed any
>> >>> observations you made *using* ORES back into the model -- since ORES'
>> >>> prediction itself could bias your assessment and we'd re-perpetuate
>> that
>> >>> bias. Still, we can use these misclassification reports to direct our
>> >>> attention to problematic behaviors in the model. We use the Wiki
>> Labels
>> >>> system[1] to gather reviews of random samples of edits from
>> Wikipedians in
>> >>> order to train the model.
>> >>>
>> >>> *Misclassification reports:*
>> >>> See
>> >>> https://meta.wikimedia.org/wiki/Research:Revision_scoring_
>> as_a_service/Misclassifications/Edit_quality
>> >>>
>> >>> We're still working out the Right(TM) way to report false positives.
>> >>> Right now, we ask that you do so on-wiki and in the future, we'll be
>> >>> exploring a nicer interface so that you can report them while using
>> the
>> >>> tool. We review these misclassification reports manually to focus
>> our work
>> >>> on the models and to report progress made. This data is never
>> directly
>> >>> used in training the machine learning models due to issues around
>> bias.
>> >>>
>> >>> *Wiki labels campaigns:*
>> >>> In order to avoid the biases in who gets reviewed and why, we generate
>> >>> random samples of edits for review using our Wiki Labels[1] system.
>> We've
>> >>> completed a labeling campaign for English Wikipedia[2], but we could
>> run an
>> >>> additional campaign to gather more data. I'll get that set up and
>> respond
>> >>> to this message when it is ready.
>> >>>
>> >>> 1. https://meta.wikimedia.org/wiki/Wiki_labels
>> >>> 2. https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality
>> >>>
>> >>> -Aaron
>> >>>
>> >>> On Tue, Aug 23, 2016 at 1:30 PM, Luis Villa <luis(a)lu.is> wrote:
>> >>>
>> >>>> Very cool! Is there any way for users of this tool to help train it?
>> >>>> For example, the first four things it flagged in my watchlist were
>> all
>> >>>> false positives (next 5-6 were correctly flagged.) It'd be nice to
>> be able
>> >>>> to contribute to training the model somehow when we see these
>> >>>> false-positives.
>> >>>>
>> >>>> On Tue, Aug 23, 2016 at 11:10 AM Amir Ladsgroup <ladsgroup(a)gmail.com
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> We (The Revision Scoring Team
>> >>>>> <https://meta.wikimedia.org/wiki/Research:Revision_scoring_
>> as_a_service#Team>)
>> >>>>> are happy to announce the deployment of the ORES
>> >>>>> <https://meta.wikimedia.org/wiki/ORES> review tool
>> >>>>> <https://www.mediawiki.org/wiki/ORES_review_tool> as a beta feature
>> >>>>> <https://en.wikipedia.org/wiki/Special:Preferences#mw-prefse
>> ction-betafeatures>
>> >>>>> on *English Wikipedia*. Once enabled, ORES highlights edits that
>> are
>> >>>>> likely to be damaging in Special:RecentChanges
>> >>>>> <https://en.wikipedia.org/wiki/Special:RecentChanges>,
>> >>>>> Special:Watchlist <https://en.wikipedia.org/wiki/Special:Watchlist>
>> >>>>> and Special:Contributions
>> >>>>> <https://en.wikipedia.org/wiki/Special:Contributions> to help you
>> >>>>> prioritize your patrolling work. ORES detects damaging edits using a
>> >>>>> basic prediction model based on past damage
>> >>>>> <https://meta.wikimedia.org/wiki/Research:Automated_classifi
>> cation_of_edit_quality>.
>> >>>>> ORES is an experimental technology. We encourage you to take
>> advantage of
>> >>>>> it but also to be skeptical of the predictions made. It's a tool to
>> support
>> >>>>> you – it can't replace you. Please reach out to us with your
>> questions and
>> >>>>> concerns.
>> >>>>> Documentationmw:ORES review tool
>> >>>>> <https://www.mediawiki.org/wiki/ORES_review_tool>,
>> mw:Extension:ORES
>> >>>>> <https://www.mediawiki.org/wiki/Extension:ORES>, and m:ORES
>> >>>>> <https://meta.wikimedia.org/wiki/ORES>Bugs & feature requests
>> >>>>> https://phabricator.wikimedia.org/tag/revision-scoring-as-a-
>> service-backlog/
>> >>>>> IRC#wikimedia-aiconnect
>> >>>>> <http://webchat.freenode.net/?channels=#wikimedia-ai>
>> >>>>> Sincerely,Amir from the Revision Scoring team
>> >>>>> _______________________________________________
>> >>>>> AI mailing list
>> >>>>> AI(a)lists.wikimedia.org
>> >>>>> https://lists.wikimedia.org/mailman/listinfo/ai
>> >>>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> AI mailing list
>> >>>> AI(a)lists.wikimedia.org
>> >>>> https://lists.wikimedia.org/mailman/listinfo/ai
>> >>>>
>> >>>>
>> >>> _______________________________________________
>> >>> AI mailing list
>> >>> AI(a)lists.wikimedia.org
>> >>> https://lists.wikimedia.org/mailman/listinfo/ai
>> >>>
>> >>
>> >> _______________________________________________
>> >> AI mailing list
>> >> AI(a)lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/ai
>> >>
>> >>
>> > _______________________________________________
>> > AI mailing list
>> > AI(a)lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/ai
>> >
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
Hey,
This is the 19th weekly update from revision scoring team that we have sent
to this mailing list.
Deployments:
- We deployed a set of new models to ORES that reduce our memory usage
and slightly increase fitness. [1] These models were discussed in an email
to the "ai" mailing list. [2]
- We also completed a major quarterly goal. The ORES review tool is now
deployed as a beta feature on 8 wikis! [3] This came with some quick fixes
to fix some confusion and usability issues. [4] The beta feature is now
available on English, Polish, Portuguese, Russian, Dutch, Persian and
Turkish Wikipedias as well as Wikidata.
New development:
- We discussed and came to a rough consensus about how to integrate ORES
into api.php. [5]
- We deployed a new edit quality campaign on English Wikipedia to gather
more data for training ORES. [6, 7]
- We added a specific set of user groups to the ORES models for Turkish
Wikipedia and saw an increase in model fitness. [8]
Maintenance and robustness:
- We fixed bugs in our maintenance scripts for purging old model
versions [9, 10]
- We switch to using our production models on the beta labs cluster so
now we can catch vandalism there too (and know that the models actually
work) [11]
- We improved the error messages reported from Wiki Labels so that the
actual error appears when the API responds with non-200 HTTP status [12]
1. https://phabricator.wikimedia.org/T144101 -- Deploy ORES at 2016-08-29
2. https://lists.wikimedia.org/pipermail/ai/2016-August/000068.html
3. https://phabricator.wikimedia.org/T140002 -- [Epic] Deploy ORES review
tool
4. https://phabricator.wikimedia.org/T143988 -- $wgOresModels set all
models true
5. https://phabricator.wikimedia.org/T122689 -- [Discuss] api.php
integration with ORES
6. https://phabricator.wikimedia.org/T143745 -- Deploy 2016 edit quality
campaign to English Wikipedia
7. https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality
8. https://phabricator.wikimedia.org/T140474 -- Include specific user
groups in the trwiki edit quality model
9. https://phabricator.wikimedia.org/T144216 -- Purge model score should
clean when there is no row is ores_model too
10. https://phabricator.wikimedia.org/T143798 -- Update model versions is
badly broken in ORES extension
11. https://phabricator.wikimedia.org/T143567 -- Switch beta to use the
proper wiki models for scoring (rather than "testwiki")
12. https://phabricator.wikimedia.org/T138255 -- Wikilabels UI reports
non-200 status errors badly
Sincerely,
Aaron from the Revision Scoring team
Hey folks,
We've been working on generating some updated models for ORES. These
models will behave slightly differently from the models that we currently
have deployed. This is a natural artifact of retraining the models on the
*exact same data* again because of some random properties of the learning
algorithms. So, for the most part, this should be a non-issue for any
tools that use ORES. However, I wanted to take this opportunity to
highlight some of the facilities ORES provides to help automatically detect
and adjust for these types of changes.
*== Versions ==*
ORES provides information about all of the models. This information
includes a model version number. If you are caching ORES scores locally,
we recommend invalidating old scores whenever this model number changes.
For example, https://ores.wikimedia.org/v2/scores/enwiki/damaging/12345678
currently returns
{
"scores": {
"enwiki": {
"damaging": {
"scores": {
"12345678": {
"prediction": false,
"probability": {
"false": 0.7141333465390294,
"true": 0.28586665346097057
}
}
},
"version": "0.1.1"
}
}
}
}
This score was generated with the "0.1.1" version of the model. But once
we deploy the new models, the same request will return:
{
"scores": {
"enwiki": {
"damaging": {
"scores": {
"12345678": {
"prediction": false,
"probability": {
"false": 0.8204647324045306,
"true": 0.17953526759546945
}
}
},
"version": "0.1.2"
}
}
}
}
Note that the version number changes to "0.1.2" and the probabilities
change slightly. In this case, we're essentially re-training the same
model in a similar way, so we increment the "patch" number.
However, we're switching modeling strategies for the article quality models
(enwiki-wp10, frwiki-wp10 & ruwiki-wp10), so those versions increment the
minor version from "0.3.2" to "0.4.0". You may see more substantial
changes in prediction probabilities with those models, but a quick
spot-checking suggests that the changes are not substantial.
*== Test statistics and threshholding ==*
So, many tools that use our edit quality models (reverted, damaging and
goodfaith) will set threshholds for flagging edits for review. In order to
support these tools, we produce test statistics that suggest useful
thresholds.
https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info=test_stats
produces:
...
"filter_rate_at_recall(min_recall=0.75)": {
"filter_rate": 0.869,
"recall": 0.752,
"threshold": 0.492
},
"filter_rate_at_recall(min_recall=0.9)": {
"filter_rate": 0.753,
"recall": 0.902,
"threshold": 0.173
},
...
These two statistics show useful thresholds for detecting damaging edits.
E.g. if you want to be sure that you catch nearly all vandalism (and are OK
with a higher false-positive rate), set the threshold at 0.173, but if
you'd like to catch most vandalism with almost no false-positives, set the
threshold at 0.492. These fields can be read automatically by tools so
that they do not need to be manually updated every time that we deploy a
new model.
Let me know if you have any questions and happy hacking!
-Aaron
Hey all,
We just deployed a change that changed default sensitivity of ORES review
tool from "hard" to "soft" (meaning recall would drop from 0.9 to 0.75 but
percentage of false positives drops too). You are still able to change it
back in your preferences (Recent changes tab).
Please come to us for any issues or questions.
Best
Hey,
This is the 18th weekly update from revision scoring team that we have sent
to this mailing list.
*Communications:*
- Aaron presented on how user-feedback has been helping us address some
sneaky biases in ORES' models. [1, 2, 3]
*New development:*
- We included 'autoreview' and 'patroller' groups in Turkish wiki models
to get a fitness boost. [4]
- We added some basic uwsgi metrics to grafana[5] and added a response
timing metric from Change Propagation so that we can track any performance
issues. [6]
*Maintenance and robustness:*
- We increased the number of workers per node in production for a 66%
increase in total capacity for ORES[7]
- We updated all of our edit quality models with the new version of
revscoring [8] and sent an email out to wikitech-l and ai-l about the
implications for tool developers. [9]
- We decided not to make specialized models for ORES in beta labs. [10]
Instead, we'll use the production models so that issues with them will be
caught in beta.
1. https://phabricator.wikimedia.org/T143275 -- Present on user-feedback
stories at Research Showcase
2. https://www.youtube.com/watch?v=rsFmqYxtt9w#t=29m00s -- Video of ORES
user-feedback talk
3.
https://www.mediawiki.org/wiki/File:Deploying_and_maintaining_AI_in_a_socio…
4. https://phabricator.wikimedia.org/T140474 -- Include specific user
groups in the trwiki edit quality model
5. https://phabricator.wikimedia.org/T143081 -- Add uwsgi-related metrics
to grafana
6. https://phabricator.wikimedia.org/T143568 -- Add median, 75% and 95%
response time to ORES dashboard
7. https://phabricator.wikimedia.org/T143105 -- Increase celery workers to
40 per scb node
8. https://phabricator.wikimedia.org/T143125 -- Update editquality models
with new version of revscoring
9. https://lists.wikimedia.org/pipermail/ai/2016-August/000068.html --
"[AI] New models coming to ORES & notes"
10. https://phabricator.wikimedia.org/T141980 -- Should we make a model for
ores in beta?
Sincerely,
Aaron from the Revision Scoring team
Forwarding, since the subjects may be of interest to people on the
Wikitech, AI, and Research lists.
I'm unqualified to evaluate Damon's comments and the FB exec's comments
about AI, so please refrain from shooting the messenger if these aren't
helpful or interesting to those of you who do know enough about AI to make
well-educated assessments.
Regards,
Pine
---------- Forwarded message ----------
From: "Damon Sicore" <damon(a)sicore.com>
Date: Aug 18, 2016 21:35
Subject: [Wikimedia-l] Facebook CTO on strategy, Internet access, Wikipedia
To: "Wikimedia Mailing List" <wikimedia-l(a)lists.wikimedia.org>
Cc:
Hi,
I usually don't recommend these things, but this interview with Schrep [1]
[2] is interesting and insightful. I recommend listening to it instead of
reading. He discusses FB's ten year plan, AI, VR, Internet access for
all, mentions Wikipedia several times, confirms their insatiable hunger for
structured data, and reveals several details on their innovation approach.
Trigger Warning: Corporate Speak
Make no mistake, I've nothing but contempt and spite for Facebook, but
having worked with Mike I also know he demonstrates formidable intellect
and is a decent person. He's incredibly capable in building amazing teams
and predicting (more like sniffing out) the future of tech. I watch his
moves closely to stay sharp.
He's right about how papers are coming out constantly which augment current
AI tech in interesting new ways. I believe we're living in interesting
times for computer science and mathematics--computational linguistics and
probabilistic search in particular. A person can't read the CS and math
papers fast enough in order to keep up with the innovation. A lot of it is
trivial, sure, but some is quite startling in impact as they combine a few
smaller things which seemed previously innocuous yet when used together
they solve key problems.
When looking into tech and strategy for WMF and the engineers it supports,
I'd be very interested in the direction Facebook is going and the
technologies they plan on investing in, so passing it along.
Yours faithfully,
Damon
[1] http://www.metisstrategy.com/interview/mike-schroepfer/
[2] https://en.wikipedia.org/wiki/Mike_Schroepfer
Damon Sicore
512 963 5126
https://damon.sicore.com
6E98 FBFB
D192 D325
B85D D4FF
FD2A 20ED
DC1D 3975
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>