Hello,
Today, Wikilabels database will get an kernel update. Which means the
database will be down for about three minutes. In that moment you can
access to the wikilabels server [1] or any of its test servers such as
labels-staging or labels-experiment but you would be unable to make a
change in the database, get stats, or any other thing that requires
database access.
Thank you for your patience
[1]: labels.wmflabs.org
Best
Labs had a DNS blip that caused ORES to be down for a few minutes (between
14:59 and 15:04 UTC) today. Everything seems to be back to normal now.
>From #wikimedia-labs <irc://irc.freenode.net/wikimedia-labs>connect
<https://webchat.freenode.net/?channels=#wikimedia-labs>:
[10:32:25] <YuviPanda> halfak: temp dns blip
[10:32:36] <halfak> Gotcha. Thanks YuviPanda
[10:32:57] <halfak> was it big enough to warrant a write-up?
[10:33:13] <halfak> If not, I'll just post "temp DNS blib" to my ORES
users and call it good.
[10:33:39] <YuviPanda> halfak: probably not, since we're doing a bunch
of DNS stuff in the next few days to shore up DNS
[10:34:20] <halfak> kk
-Aaron
Hello,
TLDR: Vandalism detection model for Wikidata just got much more accurate.
Longer version:
ORES is designed to handle different types of classification. For example
one of under development classification types is "wikiclass" which
determines type of edits. If they are adding content, or fixing mistake,
etc.
The most mature classification of ORES is edit quality. Whether an edit is
vandalism or not. We usually have three models: "reverted" model. Training
data for this model is obtained automatically. We sample around 20K edits
(for Wikidata it was different) and we consider an edit as vandalism if
they are reverted within a certain time period after the edit (7 days for
Wikidata).
On the other hand, "damaging" and "goodfaith" models are more accurate
because we sample about 20K edits. Prelabel edits that being made by
trusted users such as admins and bots as not harmful to Wikidata/Wikipedia
and then we ask users to label the rest. (For Wikidata it was around 4K
edits) Since most edits in Wikidata are made by bots and trusted users, We
altered this method for Wikidata a bit but the whole process is the same.
Don't forget that since it's human judgement, this models are more accurate
and useful to damage detection. The ORES extension uses "damaging" model
and not "reverted" model, thus having "damaging" model online is a
requirement for the extension deployment.
People label edits that if an edit is damaging to Wikidata and if the edit
is made by good intention. So we have three cases: 1- An edit is harmful to
Wikidata but made with good intention. An honest/newbie mistake 2- An edit
is harmful and made bad intention. A vandalism 3- A edit with good
intention and productive. A "good" edit".
Biggest reason to distinguish between honest mistakes and vandalisms is
that using anti-vandalism bots caused reducing on new user retention in
Wikis [1]. So future anti-vandalism bots should not revert good faith
mistakes but report them for human review.
One of good things about Wikidata damage detection labeling process is that
so many people were involved (we had 38 labelers for Wikidata[2]). Another
good thing is that its fitness very high in terms of AI [3]. But since
number of damaging edits and not damaging edits are not the same, scores it
gives to edits are not intuitive. Let me give you an example: In our
damaging model if an edit is scored less than 80% it's probably not
vandalism. Actually, in a very huge sampling of human edits we had for
reverted model we couldn't find a bad edit with score lower than 93% i.e.
If an edit is scored 92% in reverted model, you are pretty sure it's not
vandalism. Please reach out to us if you have any questions on using these
scores. Please reach out to us if have any questions in general ;)
In terms of needed changes, ScoredRevision gadget is set automatically to
prefer the damaging model. I just changed my bot in #wikidata-vandalism
channel in order to use damaging instead of reverted.
If you want to use these models. Check out our docs [4]
Sincerely,
Revision scoring team [5]
[1]: Halfaker, A.; Geiger, R. S.; Morgan, J. T.; Riedl, J. (28 December
2012). "The Rise and Decline of an Open Collaboration System: How
Wikipedia's Reaction to Popularity Is Causing Its Decline". *American
Behavioral Scientist* *57* (5): 664–688.
[2]: https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats
[3]: https://ores.wmflabs.org/scores/wikidatawiki/?model_info
[4]: https://ores.wmflabs.org/v2/
[5]:
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team
Hi Mortiz,
There's two types of stability you should be aware of: API behavior and
model scores.
You should expect that the version'd API behavior will remain stable. So,
if we choose to make a change to the request or response style, that will
appear under the path "v3/" and so forth. So, if you write code against
the v2/ API (you shouldn't be writing new code against the v1/ API, but you
*can* expect it to be stable), you should expect that it will continue to
work as expected. You can see the swagger spec's for the APIs at these
endpoints: https://ores.wmflabs.org/v1/spec/ or
https://ores.wmflabs.org/v2/spec/ You should expect that the API behavior
described will not change.
But we may still need to update the models in the future and that would
likely change the range of scores slightly. We include versions of the
models in the basic API response so that you can cache and invalidate
scores that you get from the API. We're still working out the right way to
report evaluation metrics to you so that you'll be able to dynamically
adjust any thresholds you set in your own application. FWIW, I do not
forsee us changing our modeling strategy substantially in the short- or
mid-term. It took us ~3 months of work to prepare for the breaking change
that was announced in this thread.
In the end, we're interested in learning about your needs and concerns so
that we can adjust our process and make changes accordingly. So if you
have concerns with any of the above please let us know.
-Aaron
On Sat, Apr 30, 2016 at 5:50 PM, Moritz Schubotz <physik(a)physikerwelt.de>
wrote:
> Hi Aaron,
>
> can you say a few words about the stability of the API.
> We are working on a scoring model for user contributions, rather than
> revisions using Apache Flink.
> http://imwa.gehaxelt.in:9090/pdfs/expose.pdf
> However, it would be nice to have a somehow compatible API in the end.
>
> Best
> Moritz
>
> On Thu, Apr 7, 2016 at 10:55 AM, Aaron Halfaker <aaron.halfaker(a)gmail.com>
> wrote:
>
> > FYI, the new models (BREAKING CHANGE) are now deployed.
> >
> > On Sun, Apr 3, 2016 at 5:38 AM, Aaron Halfaker <aaron.halfaker(a)gmail.com
> >
> > wrote:
> >
> > > Hey folks, we have a couple of announcements for you today. First is
> that
> > > ORES has a large set of new functionality that you might like to take
> > > advantage of. We'll also want to talk about a *BREAKING CHANGE on April
> > > 7th.*
> > >
> > > Don't know what ORES is? See
> > >
> >
> http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/
> > >
> > > *New functionality*
> > >
> > > *Scoring UI*
> > > Sometimes you just want to score a few revisions in ORES and
> remembering
> > > the URL structure is hard. So, we've build a simple scoring
> > user-interface
> > > <https://ores.wmflabs.org/ui/> that will allow you to more easily
> score
> > a
> > > set of edits.
> > >
> > > *New API version*
> > > We've been consistently getting requests to include more information in
> > > ORES' responses. In order to make space for this new information, we
> > needed
> > > to change the structure of responses. But we wanted to do this without
> > > breaking the tools that are already using ORES. So, we've developed a
> > > versioning scheme that will allow you to take advantage of new
> > > functionality when you are ready. The same old API will continue to be
> > > available at https://ores.wmflabs.org/scores/, but we've added two
> > > additional paths on top of this.
> > >
> > > - https://ores.wmflabs.org/v1/scores/ is a mirror of the old
> scoring
> > > API which will henceforth be referred to as "v1"
> > > - https://ores.wmflabs.org/v2/scores/ implements a new response
> > format
> > > that is consistent between all sub-paths and adds some new
> > functionality
> > >
> > > *Swagger documentation*
> > > Curious about the new functionality available in "v2" or maybe what the
> > > change was from "v1"? We've implemented a structured description of
> both
> > > versions of the scoring API using swagger -- which is becoming a
> defacto
> > > stanard for this sort of thing. Visit https://ores.wmflabs.org/v1/ or
> > > https://ores.wmflabs.org/v2/ to see the Swagger user-interface.
> > > Visithttps://ores.wmflabs.org/v1/spec/ or
> > > https://ores.wmflabs.org/v2/spec/ to get the specification in a
> > > machine-readable format.
> > >
> > > *Feature values & injection*
> > > Have you wondered what ORES uses to make it's predictions? You can now
> > ask
> > > ORES to show you the list of "feature" statistics it uses to score
> > > revisions. For example,
> > > https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892/?features will
> > > return the score with a mapping of feature values used by the "wp10"
> > > article quality model in English Wikipedia to score oldid=34567892
> > > <https://en.wikipedia.org/wiki/Special:Diff/34567892>. You can also
> > > "inject" features into the scoring process to see how that affects the
> > > prediction. E.g.,
> > >
> >
> https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892?features&feature.wi…
> > >
> > > *Breaking change -- new models*
> > > We've been experimenting with new learning algorithms to make ORES work
> > > better and we've found that we get better results with gradient
> boosting
> > > <https://en.wikipedia.org/wiki/Gradient_boosting> and random forest
> > > <https://en.wikipedia.org/wiki/Random_forest> strategies than we do
> with
> > > the current linear svc
> > > <https://en.wikipedia.org/wiki/Support_vector_machine> models. We'd
> like
> > > to get these new, better models deployed as soon as possible, but with
> > the
> > > new algorithm comes a change in the range of probabilities returned by
> > the
> > > model. So, when we deploy this change, any tools that uses hard-coded
> > > thresholds on ORES' prediction probabilities will suddenly start
> behaving
> > > strangely. Regretfully, we haven't found a way around this problem, so
> > > we're announcing the change now and we plan to deploy this *BREAKING
> > > CHANGE on April 7th*. Please subscribe to the AI mailinglist
> > > <https://lists.wikimedia.org/mailman/listinfo/ai> or watch our project
> > > page [[:m:ORES <https://meta.wikimedia.org/wiki/ORES>]] to catch
> > > announcements of future changes and new functionality.
> > >
> > > In order to make sure we don't end up in the same situation the next
> time
> > > we want to change an algorithm, we've included a suite of evaluation
> > > statistics with each model. The filter_rate_at_recall(0.9),
> > > filter_rate_at_recall(0.75), and recall_at_fpr(0.1) thresholds
> represent
> > > three critical thresholds (should review, needs review, and definitely
> > > damaging -- respectively) that can be used to automatically configure
> > your
> > > wiki tool. You can find out these thresholds for your model of choice
> by
> > > adding the ?model_info parameter to requests. So, come breaking
> change,
> > > we strongly recommend basing your thresholds on these statistics in the
> > > future. We'll be working to submit patches to tools that use ORES in
> the
> > > next week to implement this flexibility. Hopefully, all you'll need to
> > do
> > > is worth with us on those.
> > >
> > > -halfak & The Revision Scoring team
> > > <
> https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service>
> > >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
> --
> Mit freundlichen Grüßen
> Moritz Schubotz
>
> Telefon (Büro): +49 30 314 22784
> Telefon (Privat):+49 30 488 27330
> E-Mail: schubotz(a)itp.physik.tu-berlin.de
> Web: http://www.physikerwelt.de
> Skype: Schubi87
> ICQ: 200302764
> Msn: Moritz(a)Schubotz.de
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hello, This is our first weekly update being posted in this mailing list
New Developments
- Now you can abandon tasks you don't want to review in Wikilabels
(T105521)
- We collect user-agents in ORES requests (T113754)
- Precaching in ORES will be a daemon and more selective (T106638)
Progress in supporting new languages
- Russian reverted, damaging, and goodfaith models are built. They look
good and will be deployed this week.
- Hungarian reverted model is built, will be deployed this week.
Campaign for goodfaith and damaging is loaded in Wikilabels.
- Japanese reverted model are built, but there are still some issues to
work out. (T133405)
Active Labeling campaigns
- Edit quality (damaging and good faith)
- Wikipedias: Arabic, Azerbaijani, Dutch, German, French, Hebrew,
Hungarian, Indonesian, Italian, Japanese, Norwegian, Persian (v2),
Polish, Spanish, Ukrainian, Urdu, Vietnamese
- Wikidata
- Edit type
- English Wikipedia
Sincerely,
The Revision Scoring team.
<https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team>
Hey folks, we have a couple of announcements for you today. First is that
ORES has a large set of new functionality that you might like to take
advantage of. We'll also want to talk about a *BREAKING CHANGE on April
7th.*
Don't know what ORES is? See
http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/
*New functionality*
*Scoring UI*
Sometimes you just want to score a few revisions in ORES and remembering
the URL structure is hard. So, we've build a simple scoring user-interface
<https://ores.wmflabs.org/ui/> that will allow you to more easily score a
set of edits.
*New API version*
We've been consistently getting requests to include more information in
ORES' responses. In order to make space for this new information, we needed
to change the structure of responses. But we wanted to do this without
breaking the tools that are already using ORES. So, we've developed a
versioning scheme that will allow you to take advantage of new
functionality when you are ready. The same old API will continue to be
available at https://ores.wmflabs.org/scores/, but we've added two
additional paths on top of this.
- https://ores.wmflabs.org/v1/scores/ is a mirror of the old scoring API
which will henceforth be referred to as "v1"
- https://ores.wmflabs.org/v2/scores/ implements a new response format
that is consistent between all sub-paths and adds some new functionality
*Swagger documentation*
Curious about the new functionality available in "v2" or maybe what the
change was from "v1"? We've implemented a structured description of both
versions of the scoring API using swagger -- which is becoming a defacto
stanard for this sort of thing. Visit https://ores.wmflabs.org/v1/ or
https://ores.wmflabs.org/v2/ to see the Swagger user-interface.
Visithttps://ores.wmflabs.org/v1/spec/ or https://ores.wmflabs.org/v2/spec/
to get the specification in a machine-readable format.
*Feature values & injection*
Have you wondered what ORES uses to make it's predictions? You can now ask
ORES to show you the list of "feature" statistics it uses to score
revisions. For example,
https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892/?features will
return the score with a mapping of feature values used by the "wp10"
article quality model in English Wikipedia to score oldid=34567892
<https://en.wikipedia.org/wiki/Special:Diff/34567892>. You can also
"inject" features into the scoring process to see how that affects the
prediction. E.g.,
https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892?features&feature.wi…
*Breaking change -- new models*
We've been experimenting with new learning algorithms to make ORES work
better and we've found that we get better results with gradient boosting
<https://en.wikipedia.org/wiki/Gradient_boosting> and random forest
<https://en.wikipedia.org/wiki/Random_forest> strategies than we do with
the current linear svc
<https://en.wikipedia.org/wiki/Support_vector_machine> models. We'd like to
get these new, better models deployed as soon as possible, but with the new
algorithm comes a change in the range of probabilities returned by the
model. So, when we deploy this change, any tools that uses hard-coded
thresholds on ORES' prediction probabilities will suddenly start behaving
strangely. Regretfully, we haven't found a way around this problem, so
we're announcing the change now and we plan to deploy this *BREAKING CHANGE
on April 7th*. Please subscribe to the AI mailinglist
<https://lists.wikimedia.org/mailman/listinfo/ai> or watch our project page
[[:m:ORES <https://meta.wikimedia.org/wiki/ORES>]] to catch announcements
of future changes and new functionality.
In order to make sure we don't end up in the same situation the next time
we want to change an algorithm, we've included a suite of evaluation
statistics with each model. The filter_rate_at_recall(0.9),
filter_rate_at_recall(0.75), and recall_at_fpr(0.1) thresholds represent
three critical thresholds (should review, needs review, and definitely
damaging -- respectively) that can be used to automatically configure your
wiki tool. You can find out these thresholds for your model of choice by
adding the ?model_info parameter to requests. So, come breaking change, we
strongly recommend basing your thresholds on these statistics in the
future. We'll be working to submit patches to tools that use ORES in the
next week to implement this flexibility. Hopefully, all you'll need to do
is worth with us on those.
-halfak & The Revision Scoring team
<https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service>
Fun story, the logs of #wikimedia-dev suggest that I performed 218 actions
during the grooming of this backlog. :)
On Wed, Mar 30, 2016 at 2:01 PM, Grace Gellerman <ggellerman(a)wikimedia.org>
wrote:
> Your micro-victory looks great- well done!
>
> On Wed, Mar 30, 2016 at 10:46 AM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
> wrote:
>
>> Hey folks,
>>
>> I had a micro victory today, so I thought I'd share. The revision
>> scoring project has a huge backlog. I just spent a long time resolving old
>> tasks and organizing the *live* tasks by the projects they affect and what
>> they need me and my team to do.
>>
>> See
>> https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service-backlog/
>>
>> Why you're looking (assuming you care to look at another project's
>> backlog), feel free to file new tasks in the "Backlog" or the appropriate
>> column.
>>
>> Note that the columns are not sorted in a nice way. Next time I have
>> time for this, I'll be doing prioritization.
>>
>> -Aaron
>>
>> _______________________________________________
>> Research-Internal mailing list
>> Research-Internal(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/research-internal
>>
>>
>
> _______________________________________________
> Research-Internal mailing list
> Research-Internal(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/research-internal
>
>
Hey folks,
I had a micro victory today, so I thought I'd share. The revision scoring
project has a huge backlog. I just spent a long time resolving old tasks
and organizing the *live* tasks by the projects they affect and what they
need me and my team to do.
See
https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service-backlog/
Why you're looking (assuming you care to look at another project's
backlog), feel free to file new tasks in the "Backlog" or the appropriate
column.
Note that the columns are not sorted in a nice way. Next time I have time
for this, I'll be doing prioritization.
-Aaron