Hey folks!
This is the 32 - 41st weekly update from the revision scoring team that we
have sent to this mailing list. We've been busy, but our reporting fell
behind. So here I am getting us caught up! This is going to be a long
one. Bear with me.
One major thing we've done in the past few weeks is drafted and presented a
proposal to increase the resourcing for the ORES project in the 2017 Fiscal
Year. Currently, we're just one fully funded staff member (halfak) and
partially funded contractor (Amir1) working with a bunch of volunteers.
We're proposing to staff the team with fulltime engineers, a liaison and a
tech writer. See a full draft of our proposal and pitch deck here:
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Scoring_Platform_team
*New development:*
We've expanded support for our "editquality" models to more wikis and
improved the performance of some of the models.
- We scaled up the number of observations for Indonesian Wikipedia to
100k[1]
- We added language support for Romanian[2] and built the basic
"reverted" model[3]
- We trained and tested "damaging" and "goodfaith" models for Czech
Wikipedia[4]
- We implemented some params in our training utilites to control memory
usage[5]
- We deployed all of the above to Wikimedia Labs[6]. A production
deployment is coming soon.
Prompted by the 2016 community wishlist[7], we've implemented a
"draftquality" model for evaluating new page creations.
- We researched deletion reasons on English Wikipedia[8] and created a
labeled dataset using the deletion log.
- We engineered a set of features to predict the quality of new
articles[9] and built a model[10]
- We generated a set of datasets[11,12,13] to make it easier for
volunteers and external researchers to help us audit the performance of the
model.
- We deployed the model on WMFLabs[14] and announced it's presence to a
few interested patrollers in English Wikipedia
- We've started the process of deploying the model in production[15,16]
We completed a project exploring the use of advance natural-language
processing strategies to extract new signal about vandalism, article
quality and problematic new articles. Regretfully, memory issues prevent
us from trivially putting this into production[17], so we're looking into
alternative strategies[18].
- We implemented a strategy for extracting sentence from Wikitext[19]
- We built sentence banks for personal attacks[20, vandalism[21],
spam[22], and Featured Articles[23].
- We built PCFG-based models[24] and analyzed their ability to
differentiate[25]
We've been working with the Collaboration Team[26] on their Edit Review
Improvments project[27]
- We defined and implemented a set of new precision-based test
statistics that will inform thresholds used in their new user interface[28]
- But we also decided to continue to report recall-based test statistics
as well[29]
Based on advice from engineers on the Collaboration Team, we've begun the
process of converting Wiki labels[30] to a stand-alone tool in labs.
- We generalize the gadget interface so that it can handle all
langauges/wikis[31]
- We implemented a means to auto-configure wikis based on the
dbname[32,33] and that allowed us to simplify configuration[34]
- We also implemented some performance improvements with minification,
bundling[35]
*Labeling:*
In the past few weeks, we've set up labeling campaigns for a few wikis.
- We deployed an edit types campaign for Catalan Wikipedia[36]
- We deployed an edit quality campagin for Chinese[37] and Romanian[38]
Wikipedias
- We deployed a new type of campaign for English Wikipedia --
"discussion quality" asks editors to label talk posts as "toxic" or not[39]
*Maintenance and robustness:*
We've solved a large set of problems with logging issues, compatibility
with wikibase, and we've made minor improvements to performance.
- We addressed a few bugs in the ORES Review Tool[40,44]
- We quieted some errors from our logging in ORES[41,45]
- We updated our code to work with a wikibase schema change[42]
- We fixed a language fallback pattern in Wiki labels[43]
- We set up monitoring on ORES database disk sizes[46]
- We fixed some issues with scap, phabricator's diffusion and other
supporting systems so that we can continue deploying to beta labs[47]
- We split our assets repo so that we can let our WMFLabs deploy get
ahead of the Production deployment[48]
- ORES can now minify its JSON responses[49]
- We identified a bug in flask-assets and worked around it in our local
installation of Wiki labels[50]
*Communications and outreach:*
We had a big presence at the Wikimedia Developer summit, we've drafted a
resourcing proposal, and we've made some announcements about upcoming plans
for the ORES Review tool.
- We facilitated the "Artificial Intelligence to build and navigate
content" track[51]
- We ran a session for building an AI wishlist[52] and captured notes
about more than 20 new AI proposals on a new tag in phabricator[53]
- We also ran a session discussion the ethics and dangers of advanced
algorithms mediating our processes[54]
- We helped facilitate a session about where to surface current AIs in
Wikimedia Projects[55]
- We held a discussion with Legal about licensing labeled data that
comes out of Wiki labels[56] and updated the interface to state the CC0
license clearly[57]
- We worked with the Reading Infrastructure team to analyze the
consumption of "oresscores" through the MediaWiki API[58]
- We drafted a pitch for increasing the resources for our team[59]
- We worked with the Collaboration team to announce that they'll
experimenting with a new RecentChanged filtering strategy in the ORES
Review Tool[60,61]
1. https://phabricator.wikimedia.org/T147107 -- Scale up the number of
observations for idwiki to 100k
2. https://phabricator.wikimedia.org/T152482 -- Add language support for
Romanian
3. https://phabricator.wikimedia.org/T156504 -- Build reverted model for
Romanian Wikipedia
4. https://phabricator.wikimedia.org/T156492 -- Train and test
damaging/goodfaith models for Czech Wikipedia
5. https://phabricator.wikimedia.org/T156645 -- Add '--workers' param to
cv_train utility
6. https://phabricator.wikimedia.org/T154856 -- Clean up dependencies and
deploy newest ORES & Models in labs
7.
https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/M…
8.
https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_…
9. https://phabricator.wikimedia.org/T148580 -- Build feature set for draft
quality model
10. https://phabricator.wikimedia.org/T148038 -- [Epic] Build draft quality
model (spam, vandalism, attack, or OK)
11. https://phabricator.wikimedia.org/T148581 -- Extract features for
deleted page (draft quality model)
12. https://phabricator.wikimedia.org/T156642 -- Generate scored dataset
for 2016-08 - 2017-01
13. https://phabricator.wikimedia.org/T156643 -- Generate extracted
features for 2016-08 - 2017-01
14. https://phabricator.wikimedia.org/T155576 -- Deploy draftquality models
to WMFLabs
15. https://phabricator.wikimedia.org/T156835 -- Create package stuff for
draftquality
16. https://phabricator.wikimedia.org/T157049 -- Create new repo:
research-ores-draftquality
17. https://phabricator.wikimedia.org/T148867#2816566 -- Memory footprint
is enormous!
18. https://phabricator.wikimedia.org/T155111 -- [Spike] Investigate use of
Apertium LTtoolbox API in labs/production
19. https://phabricator.wikimedia.org/T148867 -- Implement sentences
datascources
20. https://phabricator.wikimedia.org/T148035 -- Sentence bank for personal
attacks
21. https://phabricator.wikimedia.org/T148034 -- Sentence bank for vandalism
22. https://phabricator.wikimedia.org/T148032 -- Sentence bank for spam
23. https://phabricator.wikimedia.org/T148033 -- Sentence bank for Featured
Articles
24. https://phabricator.wikimedia.org/T148037 -- Generate PCFG sentence
models
25. https://phabricator.wikimedia.org/T151819 -- Analyze differentiation of
FA, Spam, Vandalism, and Attack models/sentences.
26. https://www.mediawiki.org/wiki/Collaboration
27. https://www.mediawiki.org/wiki/Edit_Review_Improvements
28. https://phabricator.wikimedia.org/T151970 -- Implement new
precision-based test stats for editquality models
29. https://phabricator.wikimedia.org/T156644 -- Restore
recall-threshold-based metrics for editquality models.
30. https://meta.wikimedia.org/wiki/Wiki_labels
31. https://phabricator.wikimedia.org/T151120 -- Generalize standalone
gadget interface
32. https://phabricator.wikimedia.org/T154433 -- Auto config wikilabels
using dbnames
33. https://phabricator.wikimedia.org/T155439 -- Use module loader to load
JS/CSS from wikis
34. https://phabricator.wikimedia.org/T154693 -- Remove host from
wikilabels config -- infer from request
35. https://phabricator.wikimedia.org/T154122 -- Minification and bundling
for wikilabels assets
36. https://phabricator.wikimedia.org/T152965 -- Deploy cawiki edit types
campaign
37. https://phabricator.wikimedia.org/T152561 -- Deploy zhwiki edit quality
campaign
38. https://phabricator.wikimedia.org/T156357 -- Deploy edit quality
campaign for Romanian Wikipedia
39. https://phabricator.wikimedia.org/T156303 -- Deploy "Discussion
quality" campaign in wikilabels
40. https://phabricator.wikimedia.org/T152542 -- Undefined method
ORES\Hooks::getDamagingThreshold()
41. https://phabricator.wikimedia.org/T146681 -- Quiet TimeoutError in
celery logging
42. https://phabricator.wikimedia.org/T154168 -- Quantity changes broke ORES
43. https://phabricator.wikimedia.org/T154897 -- Chinese translations are
not being loaded
44. https://phabricator.wikimedia.org/T155500 -- Fatal exception of type
"DBQueryError" on sorting ORES contributions
45. https://phabricator.wikimedia.org/T157078 -- ores logspam: Model
contains an error
46. https://phabricator.wikimedia.org/T155482 -- Set up monitoring for ORES
redis database
47. https://phabricator.wikimedia.org/T157135 -- Fix broken beta-labs deploy
48. https://phabricator.wikimedia.org/T154436 -- Split wheels repo into
Prod/WMFLabs branches and maintain independence
49. https://phabricator.wikimedia.org/T155931 -- Minify json responses
50. https://phabricator.wikimedia.org/T154865 -- assets url return empty
string
51. https://phabricator.wikimedia.org/T147708 -- Artificial Intelligence to
build and navigate content
52. https://phabricator.wikimedia.org/T147710 -- What should an AI do you
for you? Building an AI Wishlist.
53. https://phabricator.wikimedia.org/tag/artificial-intelligence/
54. https://phabricator.wikimedia.org/T147929 -- Algorithmic dangers and
transparency -- Best practices
55. https://phabricator.wikimedia.org/T148690 -- Where to surface AI in
Wikimedia Projects
56. https://phabricator.wikimedia.org/T145024 -- Licensing of labeled data
57. https://phabricator.wikimedia.org/T156052 -- Add notice of CC0 status
of Wikilabels data to UI & Docs
58. https://phabricator.wikimedia.org/T156273 -- Identify baseline api.php
Action API consumption
59. https://phabricator.wikimedia.org/T157470 -- Draft proposal/pitch for
ORES resourcing
60. https://phabricator.wikimedia.org/T150855 -- Gather assets for post
about ORES review tool including ERI filters
61. https://phabricator.wikimedia.org/T150858 -- Post about ORES review
tool including ERI filters
Sincerely,
Aaron from the Revision Scoring Scoring Platform team
I've got an early draft of some notes
<https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_ta…>
for a restructuring of the revision table, to support the following:
* making the revision table itself smaller by breaking large things out
* reducing duplicate string storage for content model/format, username/IP
address, and edit comments
* multi-content revisions ("MCR") - multiple Content blobs of different
types on a page, revisioned consistently
There's also some ideas going around about using denormalized summary
tables more aggressively, perhaps changing where the indexes used for
specific uses live. For instance, a 'contribs' table with just the bits
needed for the index lookups for user-contribs, then joined to the other
tables.
Initial notes at
https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_ta…
-- I'll be cleaning this up a bit more in response to feedback and concerns.
If we go through with this sort of change, we'll need to carefully consider
the upgrade transition. We'll also need to make sure that all relevant
queries are updated, and that folks using the databases indirectly (via
tool labs, etc) are all able to cleanly handle the new fun stuff. Feedback
will be crucial here. :)
Potentially we might split this into a couple transitions instead, or
otherwise make major changes to the plan. Nothing's set in stone yet!
-- brion
Wikimedia got accepted among the 201 organizations in the Google Sumer of
Code (GSOC) 2017 <https://summerofcode.withgoogle.com/>!
We are trying to make it easier for prospective students to choose a
project idea and get started. And, so we are considering to showcase a
bunch of project ideas on the MediaWiki GSOC page itself:
https://www.mediawiki.org/wiki/Google_Summer_of_Code_2017.
Help us by mentoring a project from here:
- Check out the tasks in the '*Missing Mentors*' and '*Almost Ready to
be Mentored*' column on Possible-Tech-Projects
<https://phabricator.wikimedia.org/tag/possible-tech-projects/>
workboard.
- Check out the ''*Wishlist 11-30 (needs owner)*" and *"Wishlist 31-50
(needs owner)"* column on the Community-Wishlist-Survey-2016
<https://phabricator.wikimedia.org/project/board/2420/> workboard.
- Any portion of your project, which needs some support, would be a 2-3
month long project for a beginner and overall a good learning experience.
If you are interested in mentoring a project, add "
*Outreach-Programs-Projects*" and "*Google-Summer-of-Code (2017)*" tag to
the corresponding task on Phabricator. We will follow up with you from
there.
If you are looking for design, and documentation related projects to
mentor, we are participating in the Outreachy
<https://www.mediawiki.org/wiki/Outreachy/Round_14> program as well in
parallel to GSOC. Add "*Outreach-Programs-Projects*" and "*Outreachy
(Round-14)*" tag to a task you are interested in mentoring.
Email me if you've any questions, happy to help!!
Cheers,
Srishti
--
Srishti Sethi
Developer Advocate
Technical Collaboration team
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:SSethi_(WMF)
Hi!
I'd like to welcome you to join us at the CREDIT showcase next week,
Wednesday, 1-March-2017 at 1900 UTC / 1100 Pacific Time. We'd like to see
your demos, whether they're rough works in progress or polished production
material, or even just a telling of something you've been studying
recently. For more information on the upcoming event, as well as recordings
of previous events, please visit the following page:
https://www.mediawiki.org/wiki/CREDIT_showcase
And if you'd like to share the news about the upcoming CREDIT showcase,
here's some suggested verbiage. Thanks! -Adam
*Hi <FNAME>*
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcase open source tech
projects from Wikimedia’s Community, Reading, Editing, Discovery,
Infrastructure and Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on March 1st at 11am PT / 2pm ET / 19:00 UTC. *
*There’s more info on MediaWiki
<https://www.mediawiki.org/wiki/CREDIT_showcase>, and on Etherpad
<https://etherpad.wikimedia.org/p/CREDIT>, which is where we take notes and
ask questions. You can also ask questions on IRC in the Freenode chatroom
#wikimedia-office (web-based access here
<https://webchat.freenode.net/?channels=%23wikimedia-office>). Links to
video will become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
things.*
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
*YOURNAME*
Howdy,
A few updates this week from across the Discovery department.
== Highlights ==
*Annual Plan "Collab Jam" took place in the SF offices this week, where
lots of conversations were had on how teams within the Foundation can work
together in the next fiscal year to do cool things.
* ICU Folding is now effective on all English, French, Hebrew and Greek
wikis. [1]
** Note: please consider asking for this feature if you would like to
enable it on a particular language.
== Discussions ==
=== Search ===
* The new contentmodel search keyword is now operational on commons. [0]
* ICU Folding is now effective on all English, French, Hebrew and Greek
wikis. [1]
** Note: please consider asking for this feature if you would like to
enable it on a particular language.
=== Portal ===
* Had an issue with a bad caching of an error message which resulted in
text not being displayed on the wikipedia.org page for a very short time.
This will be fixed with a patch. [2]
* Article statistics were updated for wikipedia.org, wikiquote.org and
wikiversity.org [3]
=== Wikidata Query Service ===
* Upgraded to Blazegraph 2.1.5 RC, several bugs fixed.
* POST is now enabled for WDQS queries.
* Started nomination process for federation endpoints. [4]
[0] https://phabricator.wikimedia.org/T156371
[1] https://phabricator.wikimedia.org/T155515
[2] https://phabricator.wikimedia.org/T158782
[3] https://phabricator.wikimedia.org/T128546
[4] https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input
----
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or "Volunteer
needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner
Community Liaison - Discovery
Wikimedia Foundation
Hi everybody!
I'm working on some OOJS-UI components for an extension and I've stumbled across something:
The "OO.ui.ComboBoxInputWidget" allows to set an array of "OO.ui.MenuOptionWidget" objects in its "menu.items" config field.
Such an item can have a "label" and a "data" field. The "data" field can be of type "object" [1].
Now, if I use a "data" field of type "object" the value of the "OO.ui.ComboBoxInputWidget" will be "[Object object]", as it tries to cast the "data" value to a string when a user selects an option item.
So it looks like "OO.ui.ComboBoxInputWidget" allows only "data" of type "string" in its options. Is that correct?
That would also mean that there is no "label/data" mechanism of the input field itself. If I've got the following options
[
{ label: "Rot", data: "red" },
{ label: "Gelb", data: "yellow" },
{ label: "Grün", data: "green" }
]
and the user selects the option with label "Gelb" the input field shows "yellow", not "Gelb". Did I miss something? Is it possible to show a "label" to the user but retrieve the "data" (object) when calling "getValue" on such a field?
[1] https://doc.wikimedia.org/oojs-ui/master/js/#!/api/OO.ui.MenuOptionWidget-c…
--
Robert
Hi everyone,
At the Dev Summit a few weeks ago, there was a number of discussions
about the Community Wishlist Survey and how to make it available to
Wikimedia editors. In order to make it easier to give us constructive
criticism or copy our process, I've tried to write down what we did,
why, solutions and problems we're aware of and what we plan to do next
time:
https://meta.wikimedia.org/wiki/Community_Tech/Wishlist_Survey_outreach
//Johan Jönsson
--
Google security have announced that they have a working collision attack
against the SHA-1 hash:
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
It's highly recommended to move to sha-256 where doable.
Note that MediaWiki uses sha-1 in a number of places; in some such as
revision hashes it's advisory for tools only, but in other places like
deleted files (filearchive table) we use it for addressing, and should
consider steps to mitigate attacks swapping in alternate files during
deletion/undeletion.
-- brion
A reminder that applications to attend WikiCite 2017
<https://meta.wikimedia.org/wiki/WikiCite_2017> close on *February 27, 2017*
.
Please consider applying
<https://docs.google.com/forms/d/e/1FAIpQLScWnCLfAt88cUWKSu_E-lU8m3te_r4P3ng…>
if you work on sources and citations (or related tools) in Wikipedia,
Wikidata, Wikisource or other Wikimedia projects. If there are other people
in your network we should consider inviting to the event, please let us
know. You can contact the organizing committee at: wikicite(a)wikimedia.org.
Best,
Dario
-- on behalf of the organizers
On Thu, Feb 9, 2017 at 3:44 PM, Dario Taraborelli <
dtaraborelli(a)wikimedia.org> wrote:
> Dear all,
>
> I am happy to announce that applications to attend WikiCite ‘17 officially open
> today <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2>.
>
> About the event
>
> WikiCite 2017 <https://meta.wikimedia.org/wiki/WikiCite_2017> is a 3-day
> conference, summit and hack day to be hosted in Vienna, Austria, on May
> 23-25, 2017. It expands on efforts started last year at WikiCite 2016
> <https://meta.wikimedia.org/wiki/WikiCite_2016/Report> to design a
> central bibliographic repository, as well as tools and strategies to
> improve information quality and verifiability in Wikimedia projects.
>
> Our goal is to bring together Wikimedia contributors, data modelers,
> information and library science experts, software engineers, designers and
> academic researchers who have experience working with Wikipedia's citations
> and bibliographic data.
>
> WikiCite 2017 will be a venue to:
>
> -
>
> Day 1. (Conference) – present progress on existing work and
> initiatives for citations and bibliographic data across Wikimedia projects
> -
>
> Day 2. (Summit) – discuss technical, social, outreach and policy
> directions
> -
>
> Day 3. (Hack) – get together to build, based on new ideas and
> applications
>
>
>
> More information on the event can be found here
> <https://meta.wikimedia.org/wiki/WikiCite_2017>:
>
> How to apply
>
> Participation for this year's event is limited to 100 individuals. In
> order to be considered for participation, please fill out the following
> form <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2> and provide us with some
> information about yourself, your interests, and expected contribution.
> PLEASE NOTE THIS IS NOT THE FINAL REGISTRATION FORM. Your application will
> be reviewed and the organizing committee will extend an invitation by March
> 10, 2017. This application form is to determine the best mix of
> attendees. Not everyone who applies will receive an invitation, but there
> will be a waitlist.
>
> Important dates
>
>
> -
>
> February 9, 2017: applications open
> -
>
> February 27, 2017: applications close, waitlist opens
> -
>
> March 10, 2017: all final notifications of acceptance are issued,
> waitlist processing begins
> -
>
> March 31, 2017: attendee list is finalized
>
>
> Travel support
>
>
> Like last year, limited funding to cover travel costs of prospective
> participants will be available. Requests for travel support should be
> submitted via the application form
> <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2>. We will confirm by March 10, if
> we can provide you with travel support.
>
> Contact
>
> For any question, you can contact the organizing committee via:
> wikicite(a)wikimedia.org
>
> We look forward to seeing you in Vienna!
>
> The WikiCite 2017 organizing committee
>
> Dario Taraborelli
>
> Jonathan Dugan
>
> Lydia Pintscher
>
> Daniel Mietchen
>
> Cameron Neylon
>
>
>
> *Dario Taraborelli *Director, Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> <http://twitter.com/readermeter>
>
--
*Dario Taraborelli *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>