Hi everybody!
As a reminder the CREDIT Showcase is next week on Wednesday,
1-February-2017 (see https://www.mediawiki.org/wiki/CREDIT_showcase for
details). Also, as I mentioned previously we're conducting a survey about
CREDIT. We'd appreciate your feedback! Here is a link to the survey (which
is hosted on a third-party service), and, for information about privacy and
data handling, the survey privacy statement.
https://docs.google.com/a/wikimedia.org/forms/d/e/1FAIpQLSedAtyPfcEhT6OVd26…https://wikimediafoundation.org/wiki/CREDIT_Feedback_Survey_Privacy_Stateme…
.
This email is being sent to several mailing lists in order to reach
multiple audiences. As always, please follow the list link at the very
bottom of this email in case you want to manage your list subscription
options such as digest, unsubscribe, and so on.
And, as usual, if you'd like to share the news about the upcoming CREDIT,
here's some suggested verbiage.
*Hi <FNAME>*
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcase open source tech
projects from Wikimedia’s Community, Reading, Editing, Discovery,
Infrastructure and Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on February 1st at 11am PT / 2pm ET / 19:00 UTC. *
*There’s more info on MediaWiki
<https://www.mediawiki.org/wiki/CREDIT_showcase>, and on Etherpad
<https://etherpad.wikimedia.org/p/CREDIT>, which is where we take notes and
ask questions. You can also ask questions on IRC in the Freenode chatroom
#wikimedia-office (web-based access here
<https://webchat.freenode.net/?channels=%23wikimedia-office>). Links to
video will become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
things.*
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
*YOURNAME*
Thanks!
Adam Baso
Director of Engineering, Reading
Wikimedia Foundation
abaso(a)wikimedia.org
The Parsing team at the Wikimedia Foundation that develops the Parsoid
service is deprecating support for node 0.1x. Parsoid is the service
that powers VisualEditor, Content Translation, and Flow. If you don't
run a MediaWiki install that uses VisualEditor, then this announcement
does not affect you.
Node 0.10 has reached end of life on October 31st, 2016 [1] and node
0.12 is scheduled to reach end of life December 31st, 2016 [1].
Yesterday, we released a 0.6.1 debian package [2] and a 0.6.1 npm
version of Parsoid [3]. This will be the last release that will have
node 0.1x support. We'll continue to provide any necessary critical bug
fixes and security fixes for the 0.6.1 release till March 31st 2017 and
will be completely dropping support for all node versions before node
v4.x starting April 2017.
If you are running a Parsoid service on your wiki and are still using
node 0.1x, please upgrade your node version by April 2017. The Wikimedia
cluster runs node v4.6 right now and will soon be upgraded to node v6.x
[4]. Parsoid has been tested with node 0.1x, node v4.x and node v6.x and
works with all these versions. However, we are dropping support for node
0.1x right away from the master branch of Parsoid. Going forward, the
Parsoid codebase will adopt ES6 features available in node v4.x and
higher which aren't supported in node 0.1x and will constitute a
breaking change.
Subramanya Sastry (Subbu),
Technical Lead and Manager,
Parsing Team,
Wikimedia Foundation.
[1] Node.js Long Term Support schedule @ https://github.com/nodejs/LTS
[2] https://www.mediawiki.org/wiki/Parsoid/Releases
[3] https://www.npmjs.com/package/parsoid
[4] https://phabricator.wikimedia.org/T149331
Hey folks!
This is the 32 - 41st weekly update from the revision scoring team that we
have sent to this mailing list. We've been busy, but our reporting fell
behind. So here I am getting us caught up! This is going to be a long
one. Bear with me.
One major thing we've done in the past few weeks is drafted and presented a
proposal to increase the resourcing for the ORES project in the 2017 Fiscal
Year. Currently, we're just one fully funded staff member (halfak) and
partially funded contractor (Amir1) working with a bunch of volunteers.
We're proposing to staff the team with fulltime engineers, a liaison and a
tech writer. See a full draft of our proposal and pitch deck here:
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Scoring_Platform_team
*New development:*
We've expanded support for our "editquality" models to more wikis and
improved the performance of some of the models.
- We scaled up the number of observations for Indonesian Wikipedia to
100k[1]
- We added language support for Romanian[2] and built the basic
"reverted" model[3]
- We trained and tested "damaging" and "goodfaith" models for Czech
Wikipedia[4]
- We implemented some params in our training utilites to control memory
usage[5]
- We deployed all of the above to Wikimedia Labs[6]. A production
deployment is coming soon.
Prompted by the 2016 community wishlist[7], we've implemented a
"draftquality" model for evaluating new page creations.
- We researched deletion reasons on English Wikipedia[8] and created a
labeled dataset using the deletion log.
- We engineered a set of features to predict the quality of new
articles[9] and built a model[10]
- We generated a set of datasets[11,12,13] to make it easier for
volunteers and external researchers to help us audit the performance of the
model.
- We deployed the model on WMFLabs[14] and announced it's presence to a
few interested patrollers in English Wikipedia
- We've started the process of deploying the model in production[15,16]
We completed a project exploring the use of advance natural-language
processing strategies to extract new signal about vandalism, article
quality and problematic new articles. Regretfully, memory issues prevent
us from trivially putting this into production[17], so we're looking into
alternative strategies[18].
- We implemented a strategy for extracting sentence from Wikitext[19]
- We built sentence banks for personal attacks[20, vandalism[21],
spam[22], and Featured Articles[23].
- We built PCFG-based models[24] and analyzed their ability to
differentiate[25]
We've been working with the Collaboration Team[26] on their Edit Review
Improvments project[27]
- We defined and implemented a set of new precision-based test
statistics that will inform thresholds used in their new user interface[28]
- But we also decided to continue to report recall-based test statistics
as well[29]
Based on advice from engineers on the Collaboration Team, we've begun the
process of converting Wiki labels[30] to a stand-alone tool in labs.
- We generalize the gadget interface so that it can handle all
langauges/wikis[31]
- We implemented a means to auto-configure wikis based on the
dbname[32,33] and that allowed us to simplify configuration[34]
- We also implemented some performance improvements with minification,
bundling[35]
*Labeling:*
In the past few weeks, we've set up labeling campaigns for a few wikis.
- We deployed an edit types campaign for Catalan Wikipedia[36]
- We deployed an edit quality campagin for Chinese[37] and Romanian[38]
Wikipedias
- We deployed a new type of campaign for English Wikipedia --
"discussion quality" asks editors to label talk posts as "toxic" or not[39]
*Maintenance and robustness:*
We've solved a large set of problems with logging issues, compatibility
with wikibase, and we've made minor improvements to performance.
- We addressed a few bugs in the ORES Review Tool[40,44]
- We quieted some errors from our logging in ORES[41,45]
- We updated our code to work with a wikibase schema change[42]
- We fixed a language fallback pattern in Wiki labels[43]
- We set up monitoring on ORES database disk sizes[46]
- We fixed some issues with scap, phabricator's diffusion and other
supporting systems so that we can continue deploying to beta labs[47]
- We split our assets repo so that we can let our WMFLabs deploy get
ahead of the Production deployment[48]
- ORES can now minify its JSON responses[49]
- We identified a bug in flask-assets and worked around it in our local
installation of Wiki labels[50]
*Communications and outreach:*
We had a big presence at the Wikimedia Developer summit, we've drafted a
resourcing proposal, and we've made some announcements about upcoming plans
for the ORES Review tool.
- We facilitated the "Artificial Intelligence to build and navigate
content" track[51]
- We ran a session for building an AI wishlist[52] and captured notes
about more than 20 new AI proposals on a new tag in phabricator[53]
- We also ran a session discussion the ethics and dangers of advanced
algorithms mediating our processes[54]
- We helped facilitate a session about where to surface current AIs in
Wikimedia Projects[55]
- We held a discussion with Legal about licensing labeled data that
comes out of Wiki labels[56] and updated the interface to state the CC0
license clearly[57]
- We worked with the Reading Infrastructure team to analyze the
consumption of "oresscores" through the MediaWiki API[58]
- We drafted a pitch for increasing the resources for our team[59]
- We worked with the Collaboration team to announce that they'll
experimenting with a new RecentChanged filtering strategy in the ORES
Review Tool[60,61]
1. https://phabricator.wikimedia.org/T147107 -- Scale up the number of
observations for idwiki to 100k
2. https://phabricator.wikimedia.org/T152482 -- Add language support for
Romanian
3. https://phabricator.wikimedia.org/T156504 -- Build reverted model for
Romanian Wikipedia
4. https://phabricator.wikimedia.org/T156492 -- Train and test
damaging/goodfaith models for Czech Wikipedia
5. https://phabricator.wikimedia.org/T156645 -- Add '--workers' param to
cv_train utility
6. https://phabricator.wikimedia.org/T154856 -- Clean up dependencies and
deploy newest ORES & Models in labs
7.
https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/M…
8.
https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_…
9. https://phabricator.wikimedia.org/T148580 -- Build feature set for draft
quality model
10. https://phabricator.wikimedia.org/T148038 -- [Epic] Build draft quality
model (spam, vandalism, attack, or OK)
11. https://phabricator.wikimedia.org/T148581 -- Extract features for
deleted page (draft quality model)
12. https://phabricator.wikimedia.org/T156642 -- Generate scored dataset
for 2016-08 - 2017-01
13. https://phabricator.wikimedia.org/T156643 -- Generate extracted
features for 2016-08 - 2017-01
14. https://phabricator.wikimedia.org/T155576 -- Deploy draftquality models
to WMFLabs
15. https://phabricator.wikimedia.org/T156835 -- Create package stuff for
draftquality
16. https://phabricator.wikimedia.org/T157049 -- Create new repo:
research-ores-draftquality
17. https://phabricator.wikimedia.org/T148867#2816566 -- Memory footprint
is enormous!
18. https://phabricator.wikimedia.org/T155111 -- [Spike] Investigate use of
Apertium LTtoolbox API in labs/production
19. https://phabricator.wikimedia.org/T148867 -- Implement sentences
datascources
20. https://phabricator.wikimedia.org/T148035 -- Sentence bank for personal
attacks
21. https://phabricator.wikimedia.org/T148034 -- Sentence bank for vandalism
22. https://phabricator.wikimedia.org/T148032 -- Sentence bank for spam
23. https://phabricator.wikimedia.org/T148033 -- Sentence bank for Featured
Articles
24. https://phabricator.wikimedia.org/T148037 -- Generate PCFG sentence
models
25. https://phabricator.wikimedia.org/T151819 -- Analyze differentiation of
FA, Spam, Vandalism, and Attack models/sentences.
26. https://www.mediawiki.org/wiki/Collaboration
27. https://www.mediawiki.org/wiki/Edit_Review_Improvements
28. https://phabricator.wikimedia.org/T151970 -- Implement new
precision-based test stats for editquality models
29. https://phabricator.wikimedia.org/T156644 -- Restore
recall-threshold-based metrics for editquality models.
30. https://meta.wikimedia.org/wiki/Wiki_labels
31. https://phabricator.wikimedia.org/T151120 -- Generalize standalone
gadget interface
32. https://phabricator.wikimedia.org/T154433 -- Auto config wikilabels
using dbnames
33. https://phabricator.wikimedia.org/T155439 -- Use module loader to load
JS/CSS from wikis
34. https://phabricator.wikimedia.org/T154693 -- Remove host from
wikilabels config -- infer from request
35. https://phabricator.wikimedia.org/T154122 -- Minification and bundling
for wikilabels assets
36. https://phabricator.wikimedia.org/T152965 -- Deploy cawiki edit types
campaign
37. https://phabricator.wikimedia.org/T152561 -- Deploy zhwiki edit quality
campaign
38. https://phabricator.wikimedia.org/T156357 -- Deploy edit quality
campaign for Romanian Wikipedia
39. https://phabricator.wikimedia.org/T156303 -- Deploy "Discussion
quality" campaign in wikilabels
40. https://phabricator.wikimedia.org/T152542 -- Undefined method
ORES\Hooks::getDamagingThreshold()
41. https://phabricator.wikimedia.org/T146681 -- Quiet TimeoutError in
celery logging
42. https://phabricator.wikimedia.org/T154168 -- Quantity changes broke ORES
43. https://phabricator.wikimedia.org/T154897 -- Chinese translations are
not being loaded
44. https://phabricator.wikimedia.org/T155500 -- Fatal exception of type
"DBQueryError" on sorting ORES contributions
45. https://phabricator.wikimedia.org/T157078 -- ores logspam: Model
contains an error
46. https://phabricator.wikimedia.org/T155482 -- Set up monitoring for ORES
redis database
47. https://phabricator.wikimedia.org/T157135 -- Fix broken beta-labs deploy
48. https://phabricator.wikimedia.org/T154436 -- Split wheels repo into
Prod/WMFLabs branches and maintain independence
49. https://phabricator.wikimedia.org/T155931 -- Minify json responses
50. https://phabricator.wikimedia.org/T154865 -- assets url return empty
string
51. https://phabricator.wikimedia.org/T147708 -- Artificial Intelligence to
build and navigate content
52. https://phabricator.wikimedia.org/T147710 -- What should an AI do you
for you? Building an AI Wishlist.
53. https://phabricator.wikimedia.org/tag/artificial-intelligence/
54. https://phabricator.wikimedia.org/T147929 -- Algorithmic dangers and
transparency -- Best practices
55. https://phabricator.wikimedia.org/T148690 -- Where to surface AI in
Wikimedia Projects
56. https://phabricator.wikimedia.org/T145024 -- Licensing of labeled data
57. https://phabricator.wikimedia.org/T156052 -- Add notice of CC0 status
of Wikilabels data to UI & Docs
58. https://phabricator.wikimedia.org/T156273 -- Identify baseline api.php
Action API consumption
59. https://phabricator.wikimedia.org/T157470 -- Draft proposal/pitch for
ORES resourcing
60. https://phabricator.wikimedia.org/T150855 -- Gather assets for post
about ORES review tool including ERI filters
61. https://phabricator.wikimedia.org/T150858 -- Post about ORES review
tool including ERI filters
Sincerely,
Aaron from the Revision Scoring Scoring Platform team
I've got an early draft of some notes
<https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_ta…>
for a restructuring of the revision table, to support the following:
* making the revision table itself smaller by breaking large things out
* reducing duplicate string storage for content model/format, username/IP
address, and edit comments
* multi-content revisions ("MCR") - multiple Content blobs of different
types on a page, revisioned consistently
There's also some ideas going around about using denormalized summary
tables more aggressively, perhaps changing where the indexes used for
specific uses live. For instance, a 'contribs' table with just the bits
needed for the index lookups for user-contribs, then joined to the other
tables.
Initial notes at
https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_ta…
-- I'll be cleaning this up a bit more in response to feedback and concerns.
If we go through with this sort of change, we'll need to carefully consider
the upgrade transition. We'll also need to make sure that all relevant
queries are updated, and that folks using the databases indirectly (via
tool labs, etc) are all able to cleanly handle the new fun stuff. Feedback
will be crucial here. :)
Potentially we might split this into a couple transitions instead, or
otherwise make major changes to the plan. Nothing's set in stone yet!
-- brion
Wikimedia got accepted among the 201 organizations in the Google Sumer of
Code (GSOC) 2017 <https://summerofcode.withgoogle.com/>!
We are trying to make it easier for prospective students to choose a
project idea and get started. And, so we are considering to showcase a
bunch of project ideas on the MediaWiki GSOC page itself:
https://www.mediawiki.org/wiki/Google_Summer_of_Code_2017.
Help us by mentoring a project from here:
- Check out the tasks in the '*Missing Mentors*' and '*Almost Ready to
be Mentored*' column on Possible-Tech-Projects
<https://phabricator.wikimedia.org/tag/possible-tech-projects/>
workboard.
- Check out the ''*Wishlist 11-30 (needs owner)*" and *"Wishlist 31-50
(needs owner)"* column on the Community-Wishlist-Survey-2016
<https://phabricator.wikimedia.org/project/board/2420/> workboard.
- Any portion of your project, which needs some support, would be a 2-3
month long project for a beginner and overall a good learning experience.
If you are interested in mentoring a project, add "
*Outreach-Programs-Projects*" and "*Google-Summer-of-Code (2017)*" tag to
the corresponding task on Phabricator. We will follow up with you from
there.
If you are looking for design, and documentation related projects to
mentor, we are participating in the Outreachy
<https://www.mediawiki.org/wiki/Outreachy/Round_14> program as well in
parallel to GSOC. Add "*Outreach-Programs-Projects*" and "*Outreachy
(Round-14)*" tag to a task you are interested in mentoring.
Email me if you've any questions, happy to help!!
Cheers,
Srishti
--
Srishti Sethi
Developer Advocate
Technical Collaboration team
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:SSethi_(WMF)
Hi!
I'd like to welcome you to join us at the CREDIT showcase next week,
Wednesday, 1-March-2017 at 1900 UTC / 1100 Pacific Time. We'd like to see
your demos, whether they're rough works in progress or polished production
material, or even just a telling of something you've been studying
recently. For more information on the upcoming event, as well as recordings
of previous events, please visit the following page:
https://www.mediawiki.org/wiki/CREDIT_showcase
And if you'd like to share the news about the upcoming CREDIT showcase,
here's some suggested verbiage. Thanks! -Adam
*Hi <FNAME>*
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcase open source tech
projects from Wikimedia’s Community, Reading, Editing, Discovery,
Infrastructure and Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on March 1st at 11am PT / 2pm ET / 19:00 UTC. *
*There’s more info on MediaWiki
<https://www.mediawiki.org/wiki/CREDIT_showcase>, and on Etherpad
<https://etherpad.wikimedia.org/p/CREDIT>, which is where we take notes and
ask questions. You can also ask questions on IRC in the Freenode chatroom
#wikimedia-office (web-based access here
<https://webchat.freenode.net/?channels=%23wikimedia-office>). Links to
video will become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
things.*
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
*YOURNAME*
Howdy,
A few updates this week from across the Discovery department.
== Highlights ==
*Annual Plan "Collab Jam" took place in the SF offices this week, where
lots of conversations were had on how teams within the Foundation can work
together in the next fiscal year to do cool things.
* ICU Folding is now effective on all English, French, Hebrew and Greek
wikis. [1]
** Note: please consider asking for this feature if you would like to
enable it on a particular language.
== Discussions ==
=== Search ===
* The new contentmodel search keyword is now operational on commons. [0]
* ICU Folding is now effective on all English, French, Hebrew and Greek
wikis. [1]
** Note: please consider asking for this feature if you would like to
enable it on a particular language.
=== Portal ===
* Had an issue with a bad caching of an error message which resulted in
text not being displayed on the wikipedia.org page for a very short time.
This will be fixed with a patch. [2]
* Article statistics were updated for wikipedia.org, wikiquote.org and
wikiversity.org [3]
=== Wikidata Query Service ===
* Upgraded to Blazegraph 2.1.5 RC, several bugs fixed.
* POST is now enabled for WDQS queries.
* Started nomination process for federation endpoints. [4]
[0] https://phabricator.wikimedia.org/T156371
[1] https://phabricator.wikimedia.org/T155515
[2] https://phabricator.wikimedia.org/T158782
[3] https://phabricator.wikimedia.org/T128546
[4] https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input
----
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or "Volunteer
needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner
Community Liaison - Discovery
Wikimedia Foundation
Hi everybody!
I'm working on some OOJS-UI components for an extension and I've stumbled across something:
The "OO.ui.ComboBoxInputWidget" allows to set an array of "OO.ui.MenuOptionWidget" objects in its "menu.items" config field.
Such an item can have a "label" and a "data" field. The "data" field can be of type "object" [1].
Now, if I use a "data" field of type "object" the value of the "OO.ui.ComboBoxInputWidget" will be "[Object object]", as it tries to cast the "data" value to a string when a user selects an option item.
So it looks like "OO.ui.ComboBoxInputWidget" allows only "data" of type "string" in its options. Is that correct?
That would also mean that there is no "label/data" mechanism of the input field itself. If I've got the following options
[
{ label: "Rot", data: "red" },
{ label: "Gelb", data: "yellow" },
{ label: "Grün", data: "green" }
]
and the user selects the option with label "Gelb" the input field shows "yellow", not "Gelb". Did I miss something? Is it possible to show a "label" to the user but retrieve the "data" (object) when calling "getValue" on such a field?
[1] https://doc.wikimedia.org/oojs-ui/master/js/#!/api/OO.ui.MenuOptionWidget-c…
--
Robert
Hi everyone,
At the Dev Summit a few weeks ago, there was a number of discussions
about the Community Wishlist Survey and how to make it available to
Wikimedia editors. In order to make it easier to give us constructive
criticism or copy our process, I've tried to write down what we did,
why, solutions and problems we're aware of and what we plan to do next
time:
https://meta.wikimedia.org/wiki/Community_Tech/Wishlist_Survey_outreach
//Johan Jönsson
--