Hey folks!
This is the 32 - 41st weekly update from the revision scoring team that we
have sent to this mailing list. We've been busy, but our reporting fell
behind. So here I am getting us caught up! This is going to be a long
one. Bear with me.
One major thing we've done in the past few weeks is drafted and presented a
proposal to increase the resourcing for the ORES project in the 2017 Fiscal
Year. Currently, we're just one fully funded staff member (halfak) and
partially funded contractor (Amir1) working with a bunch of volunteers.
We're proposing to staff the team with fulltime engineers, a liaison and a
tech writer. See a full draft of our proposal and pitch deck here:
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Scoring_Platform_team
*New development:*
We've expanded support for our "editquality" models to more wikis and
improved the performance of some of the models.
- We scaled up the number of observations for Indonesian Wikipedia to
100k[1]
- We added language support for Romanian[2] and built the basic
"reverted" model[3]
- We trained and tested "damaging" and "goodfaith" models for Czech
Wikipedia[4]
- We implemented some params in our training utilites to control memory
usage[5]
- We deployed all of the above to Wikimedia Labs[6]. A production
deployment is coming soon.
Prompted by the 2016 community wishlist[7], we've implemented a
"draftquality" model for evaluating new page creations.
- We researched deletion reasons on English Wikipedia[8] and created a
labeled dataset using the deletion log.
- We engineered a set of features to predict the quality of new
articles[9] and built a model[10]
- We generated a set of datasets[11,12,13] to make it easier for
volunteers and external researchers to help us audit the performance of the
model.
- We deployed the model on WMFLabs[14] and announced it's presence to a
few interested patrollers in English Wikipedia
- We've started the process of deploying the model in production[15,16]
We completed a project exploring the use of advance natural-language
processing strategies to extract new signal about vandalism, article
quality and problematic new articles. Regretfully, memory issues prevent
us from trivially putting this into production[17], so we're looking into
alternative strategies[18].
- We implemented a strategy for extracting sentence from Wikitext[19]
- We built sentence banks for personal attacks[20, vandalism[21],
spam[22], and Featured Articles[23].
- We built PCFG-based models[24] and analyzed their ability to
differentiate[25]
We've been working with the Collaboration Team[26] on their Edit Review
Improvments project[27]
- We defined and implemented a set of new precision-based test
statistics that will inform thresholds used in their new user interface[28]
- But we also decided to continue to report recall-based test statistics
as well[29]
Based on advice from engineers on the Collaboration Team, we've begun the
process of converting Wiki labels[30] to a stand-alone tool in labs.
- We generalize the gadget interface so that it can handle all
langauges/wikis[31]
- We implemented a means to auto-configure wikis based on the
dbname[32,33] and that allowed us to simplify configuration[34]
- We also implemented some performance improvements with minification,
bundling[35]
*Labeling:*
In the past few weeks, we've set up labeling campaigns for a few wikis.
- We deployed an edit types campaign for Catalan Wikipedia[36]
- We deployed an edit quality campagin for Chinese[37] and Romanian[38]
Wikipedias
- We deployed a new type of campaign for English Wikipedia --
"discussion quality" asks editors to label talk posts as "toxic" or not[39]
*Maintenance and robustness:*
We've solved a large set of problems with logging issues, compatibility
with wikibase, and we've made minor improvements to performance.
- We addressed a few bugs in the ORES Review Tool[40,44]
- We quieted some errors from our logging in ORES[41,45]
- We updated our code to work with a wikibase schema change[42]
- We fixed a language fallback pattern in Wiki labels[43]
- We set up monitoring on ORES database disk sizes[46]
- We fixed some issues with scap, phabricator's diffusion and other
supporting systems so that we can continue deploying to beta labs[47]
- We split our assets repo so that we can let our WMFLabs deploy get
ahead of the Production deployment[48]
- ORES can now minify its JSON responses[49]
- We identified a bug in flask-assets and worked around it in our local
installation of Wiki labels[50]
*Communications and outreach:*
We had a big presence at the Wikimedia Developer summit, we've drafted a
resourcing proposal, and we've made some announcements about upcoming plans
for the ORES Review tool.
- We facilitated the "Artificial Intelligence to build and navigate
content" track[51]
- We ran a session for building an AI wishlist[52] and captured notes
about more than 20 new AI proposals on a new tag in phabricator[53]
- We also ran a session discussion the ethics and dangers of advanced
algorithms mediating our processes[54]
- We helped facilitate a session about where to surface current AIs in
Wikimedia Projects[55]
- We held a discussion with Legal about licensing labeled data that
comes out of Wiki labels[56] and updated the interface to state the CC0
license clearly[57]
- We worked with the Reading Infrastructure team to analyze the
consumption of "oresscores" through the MediaWiki API[58]
- We drafted a pitch for increasing the resources for our team[59]
- We worked with the Collaboration team to announce that they'll
experimenting with a new RecentChanged filtering strategy in the ORES
Review Tool[60,61]
1. https://phabricator.wikimedia.org/T147107 -- Scale up the number of
observations for idwiki to 100k
2. https://phabricator.wikimedia.org/T152482 -- Add language support for
Romanian
3. https://phabricator.wikimedia.org/T156504 -- Build reverted model for
Romanian Wikipedia
4. https://phabricator.wikimedia.org/T156492 -- Train and test
damaging/goodfaith models for Czech Wikipedia
5. https://phabricator.wikimedia.org/T156645 -- Add '--workers' param to
cv_train utility
6. https://phabricator.wikimedia.org/T154856 -- Clean up dependencies and
deploy newest ORES & Models in labs
7.
https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/M…
8.
https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_…
9. https://phabricator.wikimedia.org/T148580 -- Build feature set for draft
quality model
10. https://phabricator.wikimedia.org/T148038 -- [Epic] Build draft quality
model (spam, vandalism, attack, or OK)
11. https://phabricator.wikimedia.org/T148581 -- Extract features for
deleted page (draft quality model)
12. https://phabricator.wikimedia.org/T156642 -- Generate scored dataset
for 2016-08 - 2017-01
13. https://phabricator.wikimedia.org/T156643 -- Generate extracted
features for 2016-08 - 2017-01
14. https://phabricator.wikimedia.org/T155576 -- Deploy draftquality models
to WMFLabs
15. https://phabricator.wikimedia.org/T156835 -- Create package stuff for
draftquality
16. https://phabricator.wikimedia.org/T157049 -- Create new repo:
research-ores-draftquality
17. https://phabricator.wikimedia.org/T148867#2816566 -- Memory footprint
is enormous!
18. https://phabricator.wikimedia.org/T155111 -- [Spike] Investigate use of
Apertium LTtoolbox API in labs/production
19. https://phabricator.wikimedia.org/T148867 -- Implement sentences
datascources
20. https://phabricator.wikimedia.org/T148035 -- Sentence bank for personal
attacks
21. https://phabricator.wikimedia.org/T148034 -- Sentence bank for vandalism
22. https://phabricator.wikimedia.org/T148032 -- Sentence bank for spam
23. https://phabricator.wikimedia.org/T148033 -- Sentence bank for Featured
Articles
24. https://phabricator.wikimedia.org/T148037 -- Generate PCFG sentence
models
25. https://phabricator.wikimedia.org/T151819 -- Analyze differentiation of
FA, Spam, Vandalism, and Attack models/sentences.
26. https://www.mediawiki.org/wiki/Collaboration
27. https://www.mediawiki.org/wiki/Edit_Review_Improvements
28. https://phabricator.wikimedia.org/T151970 -- Implement new
precision-based test stats for editquality models
29. https://phabricator.wikimedia.org/T156644 -- Restore
recall-threshold-based metrics for editquality models.
30. https://meta.wikimedia.org/wiki/Wiki_labels
31. https://phabricator.wikimedia.org/T151120 -- Generalize standalone
gadget interface
32. https://phabricator.wikimedia.org/T154433 -- Auto config wikilabels
using dbnames
33. https://phabricator.wikimedia.org/T155439 -- Use module loader to load
JS/CSS from wikis
34. https://phabricator.wikimedia.org/T154693 -- Remove host from
wikilabels config -- infer from request
35. https://phabricator.wikimedia.org/T154122 -- Minification and bundling
for wikilabels assets
36. https://phabricator.wikimedia.org/T152965 -- Deploy cawiki edit types
campaign
37. https://phabricator.wikimedia.org/T152561 -- Deploy zhwiki edit quality
campaign
38. https://phabricator.wikimedia.org/T156357 -- Deploy edit quality
campaign for Romanian Wikipedia
39. https://phabricator.wikimedia.org/T156303 -- Deploy "Discussion
quality" campaign in wikilabels
40. https://phabricator.wikimedia.org/T152542 -- Undefined method
ORES\Hooks::getDamagingThreshold()
41. https://phabricator.wikimedia.org/T146681 -- Quiet TimeoutError in
celery logging
42. https://phabricator.wikimedia.org/T154168 -- Quantity changes broke ORES
43. https://phabricator.wikimedia.org/T154897 -- Chinese translations are
not being loaded
44. https://phabricator.wikimedia.org/T155500 -- Fatal exception of type
"DBQueryError" on sorting ORES contributions
45. https://phabricator.wikimedia.org/T157078 -- ores logspam: Model
contains an error
46. https://phabricator.wikimedia.org/T155482 -- Set up monitoring for ORES
redis database
47. https://phabricator.wikimedia.org/T157135 -- Fix broken beta-labs deploy
48. https://phabricator.wikimedia.org/T154436 -- Split wheels repo into
Prod/WMFLabs branches and maintain independence
49. https://phabricator.wikimedia.org/T155931 -- Minify json responses
50. https://phabricator.wikimedia.org/T154865 -- assets url return empty
string
51. https://phabricator.wikimedia.org/T147708 -- Artificial Intelligence to
build and navigate content
52. https://phabricator.wikimedia.org/T147710 -- What should an AI do you
for you? Building an AI Wishlist.
53. https://phabricator.wikimedia.org/tag/artificial-intelligence/
54. https://phabricator.wikimedia.org/T147929 -- Algorithmic dangers and
transparency -- Best practices
55. https://phabricator.wikimedia.org/T148690 -- Where to surface AI in
Wikimedia Projects
56. https://phabricator.wikimedia.org/T145024 -- Licensing of labeled data
57. https://phabricator.wikimedia.org/T156052 -- Add notice of CC0 status
of Wikilabels data to UI & Docs
58. https://phabricator.wikimedia.org/T156273 -- Identify baseline api.php
Action API consumption
59. https://phabricator.wikimedia.org/T157470 -- Draft proposal/pitch for
ORES resourcing
60. https://phabricator.wikimedia.org/T150855 -- Gather assets for post
about ORES review tool including ERI filters
61. https://phabricator.wikimedia.org/T150858 -- Post about ORES review
tool including ERI filters
Sincerely,
Aaron from the Revision Scoring Scoring Platform team
Dear sir /ma,
I am Shadrach Promise Owhorji, from ikwerre local government area of
Rivers State of the federal Republic of Nigeria.
I am an adult male citizen, aged 29 years old.
I am a human rights activist, Educationist and also a political scientist
by profession.
Having seen the good work of the AI, and having glance through the past
and present records of the AI, have just decided to be part of the
Community that affects people's lives positively. I really wish to be
giving the opportunity and audience to push and defend the course of the
Amnesty International in my region.
Having stipulated the above, my request is therefore your approval to
start up the campaign of AI here, perhaps engage myself with training and
retraining of the Amnesty supporters around the world.
I also call on your office to provide me the required documents in order to
to have issues with representation here in Nigeria.
I will be glad if my request will be granted.
I anticipate your cooperation.
Sincerely, Comrade, Shadrach Promise Owhorji......
I just found this talk by the Wikimedia Legal interns from July of 2016.
See https://www.youtube.com/watch?v=2Y_4HCtxvtw
The Wikimedia legal interns will be hosting a panel titled "Artificial
> Intelligence and the Law." The discussion (lasting about an hour) will
> focus on the intersection of emerging technologies, like driverless cars,
> web crawlers, and lethal autonomous weapons, and the legal issues they
> raise, including in the areas of employment and liability. The panel
> speakers will explore the legal challenges presented by emerging
> technologies and artificial intelligence and how the legal industry has
> responded to those challenges.
-Aaron
Good evening sir,
Please I don't understand this information,
Could you please explain to me more about this message.
Thank you,
Sincerely, Shadrach....
On Feb 10, 2017 1:00 PM, <ai-request(a)lists.wikimedia.org> wrote:
> Send AI mailing list submissions to
> ai(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/ai
> or, via email, send a message with subject or body 'help' to
> ai-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> ai-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of AI digest..."
>
>
> Today's Topics:
>
> 1. personal AI (Toby Negrin)
> 2. Re: personal AI (Aaron Halfaker)
> 3. Re: personal AI (Toby Negrin)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 9 Feb 2017 12:25:19 -0800
> From: Toby Negrin <tnegrin(a)wikimedia.org>
> To: Application of Artificial Intelligence and other advanced
> computing strategies to Wikimedia Projects <ai(a)lists.wikimedia.org
> >
> Subject: [AI] personal AI
> Message-ID:
> <CAAjh0ExupEFz32Ham_fmv=sPW9EHDfQGSFc+-d91FSGD1p9p7g@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hey ORES team -- we need 1,000,000,000 new models :)
>
> https://qz.com/906751/tech-companies-are-building-tiny-
> personal-ais-to-keep-your-messages-private/
>
> -Toby
>
Just saw this fly by. Thought someone on this list might be interested.
---------- Forwarded message ----------
From: Léa Lacroix <lea.lacroix(a)wikimedia.de>
Date: Tue, Feb 7, 2017 at 2:45 AM
Subject: [Wikidata] WMDE looking for a data analyst
To: "Discussion list for the Wikidata project." <
wikidata(a)lists.wikimedia.org>
Hello all,
Our development team is looking for a data analyst, in freelance, remotely,
to work mostly on Wikidata.
The person will:
- Work closely with product managers and UX researchers to maintain and
improve detailed on-going analysis of the department’s products, their
usage patterns and performance.
- Write database queries and supporting code to analyze usage volume,
user behaviour and performance data to identify opportunities and areas for
improvement.
- Collaborate with other analysts in the department to maintain our
department’s dashboards, ensuring they are up-to-date, accurate, fair and
focussed on representations of our product efficiency.
- Support product managers through rapidly surfacing positive and
adverse data trends, and complete ad hoc analysis support as needed.
- Communicate clearly and responsively your findings to a range of
departmental, organisational, volunteer and public stakeholders in order to
inform and educate them.
If you want to know more and apply: https://software.wikimedia.de/
jobs/data-analyst
See also our other job offers: https://software.wikimedia.de/jobs
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
Hey everyone,
This mean wikilables will be down during that time.
Best
---------- Forwarded message ---------
From: Yuvi Panda <yuvipanda(a)gmail.com>
Date: Mon, Feb 6, 2017 at 9:28 AM
Subject: [Labs-l] [Labs-announce] Tools DB / Labs Postgres DB maintenance
on 15 Feb 2017
To: <labs-announce(a)lists.wikimedia.org>
Hello!
Tools DB and Labs Postgres DB will be undergoing maintenance on 15 Feb
2017 for about 6 hours starting at 1700 UTC and will be unreachable
for some of the duration. Most users shouldn't experience issues if
their code reconnects properly when the server stops accepting
connections (we'll failover to slaves when doing maintenance). Some
tables will not be available for a short period of time, but the tool
owners of those tables have already been notified (see
https://phabricator.wikimedia.org/T127164 ). We'll try to minimize
downtime as much as possible.
We will be upgrading the operating system from Ubuntu Precise to
Debian Jessie in preparation for EOL of Ubuntu Precise (in April
2017). We'll also take this opportunity to upgrade Tools DB to Mariadb
10.
All data should be preserved in this migration. Follow
https://phabricator.wikimedia.org/T123731 for more information!
Thanks!
--
Yuvi Panda T
http://yuvi.in/blog
_______________________________________________
Labs-announce mailing list
Labs-announce(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/labs-announce
_______________________________________________
Labs-l mailing list
Labs-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/labs-l