Updates to ORES service & BREAKING CHANGE on April 7th - Wiki-research-l

3 Apr 2016


      Hey folks, we have a couple of announcements for you today. First is that
ORES has a large set of new functionality that you might like to take
advantage of. We'll also want to talk about a *BREAKING CHANGE on April
7th.*
Don't know what ORES is?  See
http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/
*New functionality*
*Scoring UI*
Sometimes you just want to score a few revisions in ORES and remembering
the URL structure is hard. So, we've build a simple scoring user-interface
https://ores.wmflabs.org/ui/ that will allow you to more easily score a
set of edits.
*New API version*
We've been consistently getting requests to include more information in
ORES' responses. In order to make space for this new information, we needed
to change the structure of responses. But we wanted to do this without
breaking the tools that are already using ORES. So, we've developed a
versioning scheme that will allow you to take advantage of new
functionality when you are ready. The same old API will continue to be
available at https://ores.wmflabs.org/scores/, but we've added two
additional paths on top of this.
- https://ores.wmflabs.org/v1/scores/ is a mirror of the old scoring API
   which will henceforth be referred to as "v1"
   - https://ores.wmflabs.org/v2/scores/ implements a new response format
   that is consistent between all sub-paths and adds some new functionality
*Swagger documentation*
Curious about the new functionality available in "v2" or maybe what the
change was from "v1"? We've implemented a structured description of both
versions of the scoring API using swagger -- which is becoming a defacto
stanard for this sort of thing. Visit https://ores.wmflabs.org/v1/ or
https://ores.wmflabs.org/v2/ to see the Swagger user-interface.
Visithttps://ores.wmflabs.org/v1/spec/ or https://ores.wmflabs.org/v2/spec/
to get the specification in a machine-readable format.
*Feature values & injection*
Have you wondered what ORES uses to make it's predictions? You can now ask
ORES to show you the list of "feature" statistics it uses to score
revisions. For example,
https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892/?features will
return the score with a mapping of feature values used by the "wp10"
article quality model in English Wikipedia to score oldid=34567892
https://en.wikipedia.org/wiki/Special:Diff/34567892. You can also
"inject" features into the scoring process to see how that affects the
prediction. E.g.,
https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892?features&feature...
*Breaking change -- new models*
We've been experimenting with new learning algorithms to make ORES work
better and we've found that we get better results with gradient boosting
https://en.wikipedia.org/wiki/Gradient_boosting and random forest
https://en.wikipedia.org/wiki/Random_forest strategies than we do with
the current linear svc
https://en.wikipedia.org/wiki/Support_vector_machine models. We'd like to
get these new, better models deployed as soon as possible, but with the new
algorithm comes a change in the range of probabilities returned by the
model. So, when we deploy this change, any tools that uses hard-coded
thresholds on ORES' prediction probabilities will suddenly start behaving
strangely. Regretfully, we haven't found a way around this problem, so
we're announcing the change now and we plan to deploy this *BREAKING CHANGE
on April 7th*. Please subscribe to the AI mailinglist
https://lists.wikimedia.org/mailman/listinfo/ai or watch our project page
[[:m:ORES https://meta.wikimedia.org/wiki/ORES]] to catch announcements
of future changes and new functionality.
In order to make sure we don't end up in the same situation the next time
we want to change an algorithm, we've included a suite of evaluation
statistics with each model. The filter_rate_at_recall(0.9),
filter_rate_at_recall(0.75), and recall_at_fpr(0.1) thresholds represent
three critical thresholds (should review, needs review, and definitely
damaging -- respectively) that can be used to automatically configure your
wiki tool.  You can find out these thresholds for your model of choice by
adding the ?model_info parameter to requests.  So, come breaking change, we
strongly recommend basing your thresholds on these statistics in the
future. We'll be working to submit patches to tools that use ORES in the
next week to implement this flexibility.  Hopefully, all you'll need to do
is worth with us on those.
-halfak & The Revision Scoring team
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service