Hey,
This is the 30th and 31st weekly update from the revision scoring team that
we have sent to this mailing list. We accidentally skipped a week again.
*New development:*
- We added a new "lowest" sensitivity level to ORES review tool. This
new sensistivity level will only flag edits that ORES is very confident are
actually damaging[1].
- We applied the MediaWiki standard color palette to Wikilabels[2]
- We generated a manually censored public dataset of
spam/vandalism/attack pages[3]. This will help others to develop spam,
vandalism and attack page detection models. See the publication of the
dataset[4].
- We've implement color-based confidence reporting for ORES damage
detection[5]
*Maintenance and robustness:*
- We updated the version of OOjs-UI that gets bundled with Wiki
labels[6] and moved the static assets to a new repositiory[7]
- We fixed an issue in the recscoring library[8] that caused ORES to
return invalid JSON and rendered the UI useless[9].
*Communications:*
- We gave a 3 minute presentation on the state of ORES to Victoria
Coleman, the WMF's new CTO[10].
- We performed a basic analysis of Wikipedia article quality trends
using the dataset we released a few weeks ago[11]. We'll have a more
substantial analysis soon.
- We made a post on the ORES review tool talk page[12,13] detailing how
we plan to incorporate a new filtering strategy into the ORES review tool.
Please join the discussion there.
1. https://phabricator.wikimedia.org/T150224 -- Add "Lowest" ORES
sensitivity for fpr=0.1
2. https://phabricator.wikimedia.org/T151119 -- Apply ui standardization
color palette to Wikilabels
3. https://phabricator.wikimedia.org/T150307 -- Create manually vetted
dataset of spam/vandalism/attack pages
4. https://dx.doi.org/10.6084/m9.figshare.4245035
5. https://phabricator.wikimedia.org/T144922 -- Visually report damaging
confidence
6. https://phabricator.wikimedia.org/T151222 -- Update bundled OOJS-ui with
Wikilabels
7. https://github.com/wiki-ai/flask-oojsui
8. https://phabricator.wikimedia.org/T150961 -- ORES ui is broken (text
field disabled)
9. https://github.com/wiki-ai/ores/issues/177
10. https://phabricator.wikimedia.org/T150544 -- ORES (a 2-3 minute
presentation)
11. https://phabricator.wikimedia.org/T151214 -- Basic analysis of
Wikipedia quality using monthly predictions
12. https://phabricator.wikimedia.org/T150858 -- Post about ORES review
tool including ERI filters
13. https://www.mediawiki.org/wiki/Topic:Tflhjj5x1numzg67
Sincerely,
Aaron from the Revision Scoring team
Hey,
With merge of 320328 [1] and 320341, two major changes will come to ORES
review tool:
1- You will see one more option in ORES sensitivity called "Lowest". It
means if you choose it, it only flags edit that are very likely to be
vandalism.
2- Coloring of rows will be completely different. You will see several
colors instead of one and as confidence of ORES grows, the colors will tend
to be more noticeable. It goes without saying that you can change these
colors in your own css. I put a screenshot in [3] and you can test it in
https://en.wikipedia.beta.wmflabs.org or https://mw-revscoring.wmflabs.org
Feedback is always welcome
[1]: https://gerrit.wikimedia.org/r/#/c/320328/
[2]: https://gerrit.wikimedia.org/r/#/c/320341/
[3]: https://phabricator.wikimedia.org/T144922#2824696
Best
--
Amir Sarabadani Tafreshi
Software Engineer (contractor)
-------------------------------------
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Hey folks,
I'm your friendly facilitator for who forgot that today was the last day to
gather discussion on a set of topics of the Dev Summit. I might be a bit
biased, but I think they are all pretty interesting, so I'm reaching out
with a quick overview to see if I can spur some interest from ya'll. Check
'em out:
- https://phabricator.wikimedia.org/T149373 -- Evaluating the user
experience of AI systems
- https://phabricator.wikimedia.org/T147710 -- Building an AI wishlist &
working groups for Wikimedia Projects
- https://phabricator.wikimedia.org/T148690 -- Where to surface AI in
Wikimedia Projects
- https://phabricator.wikimedia.org/T147929 -- Algorithmic dangers and
transparency -- Best practices
- https://phabricator.wikimedia.org/T149666 -- Next steps for machine
translation
If you're interested, please drop a note or a token in the task. BTW, you
don't have to physically attend the dev summit in order to participate.
I'll make sure that IRC and Etherpad are shared with all remote attendees
who want to attend the sessions I'm helping to organize. I've heard that
there will be additional facilities for remote attendees (maybe a youtube
stream!?) this year, but I can't confirm yet.
-Aaron
Hey,
This is the 29th weekly update from revision scoring team that we have sent
to this mailing list.
Deployments:
- We deployed logging changes to ORES that will reduce the verbosity[1]
- We also deployed revscoring 1.3.0 and new models built with it to WMF
labs[2]. This won't change anything important from a user-perspective, but
it paves the way for developing new modeling strategies.
Maintenance and robustness:
- We fixed puppet so that log file directories are also created on the
celery worker nodes (affects wmflabs)[3]
- We fixed an issue with our recall_at_fpr metrics which was incorrectly
defined and implemented a recall_at_precision metric to take its place[4]
New development:
- We've made a lot of progress on modeling sentences and have just
started experimenting with a sentence model from featured articles[5]
- We're reviewing a dataset of spam/vandalism/attack new page creations
for public release[6]. This dataset will help our collaborators work with
us on modeling the quality of drafts and supporting new page triage.
1. https://phabricator.wikimedia.org/T149730 -- Deploy logging changes to
ORES
2. https://phabricator.wikimedia.org/T150447 -- Deploy revscoring 1.3.0 and
updated editquality and wikiclass to wmflabs
3. https://phabricator.wikimedia.org/T149925 -- /srv/log/ores/ not created
on worker nodes
4. https://phabricator.wikimedia.org/T149825 -- Implement recall at
precision (and fix FPR metrics)
5. https://phabricator.wikimedia.org/T148867 -- Implement sentences
datascources & experiment with normalization.
6. https://phabricator.wikimedia.org/T150307 -- Create manually vetted
dataset of spam/vandalism/attack pages
Sincerely,
Aaron from the Revision Scoring team
Hey,
This is the 26th and 27th weekly update from revision scoring team that we
have sent to this mailing list. We forgot to send the update for last week!
Last week, we were featured in Research's quarterly review. In the last 3
months, we achieved our goals to expand the ORES extension to 6 wikis (we
made it to 8!) and to release datasets of article quality predictions. The
minutes from the quarterly review are not yet online, but once they are,
you'll be able to see them at [1].
Maintenance and robustness:
- We discussed and decided on a set of strategies for handling
goodfaith/naive DOS attacks on ORES[2]
- We fixed an i18n issue in Wiki Labels[3]
- We updated the article quality models (wikiclass/wp10) to use
revscoring 1.3.0[4]
- We investigated and solved a memory leak in our pre-caching utility[5]
- We configured celery to send its logs to a place where we can read
them for easier debugging[6]
- We deployed a set of schema changes to constrain the ORES Review Tools
database appropriately[7]
- Also worth noting is that the services cluster (SCB) has been
expanded[8]. ORES has now doubled in capacity
Datasets
- We discussed how to make the historical article quality dataset
available via quarry[8]. Regretfully, it seems that we'll not be able to do
that for at least a couple of months.
New development
- We've implemented embedding of machine-readable scores in a JS
variable on-wiki[9]. This will make it easier for tool developers to
experiment with new ways of displaying Special:RecentChanges more easily.
It's also a necessary precondition for adding color-based signaling of
ORES' confidence about an edit.
1.
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_metrics_and_activities…
2. https://phabricator.wikimedia.org/T148347 -- [Discuss] DOS attacks on
ORES. What to do?
3. https://phabricator.wikimedia.org/T139587 -- Revision not found error
unformatted and not localized
4. https://phabricator.wikimedia.org/T147201 -- Update wikiclass for
revscoring 1.3.0
5. https://phabricator.wikimedia.org/T146500 -- Investigate memory leak in
precached
6. https://phabricator.wikimedia.org/T147898 -- Send celery logs to
/srv/log/ores instead of /var/lib/daemon.log
7. https://phabricator.wikimedia.org/T147734 -- Review and deploy 309825
8. https://phabricator.wikimedia.org/T147903 -- Expand SCB cluster
9. https://phabricator.wikimedia.org/T146718 -- [Discuss] Hosting the
monthly article quality dataset on labsDB
10. https://phabricator.wikimedia.org/T143611 -- Embed machine readable
ores scores as data on pages where ORES scores things
Sincerely,
Aaron from the Revision Scoring team
Hello!
The Wikimedia Developer Summit
<https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit> is the annual
meeting to push the evolution of MediaWiki and other technologies
supporting the Wikimedia movement. The next edition will be held in San
Francisco on January 9-11, 2017.
We welcome all Wikimedia technical contributors, third party developers,
and users of MediaWiki and the Wikimedia APIs. We specifically want to
increase the participation of volunteer developers and other contributors
dealing with extensions, apps, tools, bots, gadgets, and templates.
Important deadlines:
- Monday, October 24: This is the last day to request travel
sponsorship. Applying takes less than five minutes.
- Monday, October 31: This is the last day to propose an activity. Bring
the topics you care about!
Subscribe to weekly updates: https://www.mediawiki.org/
wiki/Topic:Td5wfd70vptn8eu4
Please feel free to forward this email to anyone who might be interested in
attending!
Thanks,
Srishti
--
Srishti Sethi
ssethi(a)wikimedia.org
Hey,
It seems there is some sort of back pressure on the ORES service right now
causing to send out timeout and overload errors which made icinga scream at
#wikimedia-ai several times today. If you're running the requests, please
slow down a little.
We Increased the capacity for now [1] (Thanks to Andrew Bogott). That
brought back everything to the normal state. Sorry for any inconvenience.
[1] https://gerrit.wikimedia.org/r/#/c/316271/
Best