As of 950cf6016c, the mediawiki/core repo was updated to use DB_REPLICA
instead of DB_SLAVE, with the old constant left as an alias. This is part
of a string of commits that cleaned up the mixed use of "replica" and
"slave" by sticking to the former. Extensions have not been mass
converted. Please use the new constant in any new code.
The word "replica" is a bit more indicative of a broader range of DB
setups*, is used by a range of large companies**, and is more neutral in
connotations.
Drupal and Django made similar updates (even replacing the word "master"):
* https://www.drupal.org/node/2275877
* https://github.com/django/django/pull/2692/files &
https://github.com/django/django/commit/beec05686ccc3bee8461f9a5a02c607a023…
I don't plan on doing anything to DB_MASTER, since it seems fine by itself,
like "master copy", "master tape" or "master key". This is analogous to a
master RDBMs database. Even multi-master RDBMs systems tend to have a
stronger consistency than classic RDBMs slave servers, and present
themselves as one logical "master" or "authoritative" copy. Even in it's
personified form, a "master" database can readily be thought of as
analogous to "controller", "governer", "ruler", lead "officer", or such.**
* clusters using two-phase commit, galera using certification-based
replication, multi-master circular replication, ect...
**
https://en.wikipedia.org/wiki/Master/slave_(technology)#Appropriateness_of_…
***
http://www.merriam-webster.com/dictionary/master?utm_campaign=sd&utm_medium…
--
-Aaron
Please join for the following talk:
*Tech Talk**:* A Gentle Introduction to Wikidata for Absolute Beginners
[including non-techies!]
*Presenter:* Asaf Bartov
*Date:* February 09, 2017
*Time: *19:00 UTC
<https://www.timeanddate.com/worldclock/fixedtime.html?msg=Tech+Talk%3A+A+Ge…>
Link to live YouTube stream <https://www.youtube.com/watch?v=eVrAx3AmUvA>
*IRC channel for questions/discussion:* #wikimedia-office
*Summary: *This talk will introduce you to the Wikimedia Movement's latest
major wiki project: Wikidata. We will cover what Wikidata is, how to
contribute, how to embed Wikidata into articles on other wikis, tools like
the Wikidata Game, and how to query Wikidata (including SPARQL examples).
O'Reilly just published some of their popular books for free, either as
part of open access movement or some kind of marketing (or both). I find
them useful to Wikimedia developers. It supports several types of e-books
so you can read it in your kindle, etc.:
* Performance, Operations, Release engineering:
http://www.oreilly.com/webops-perf/free/
* Data, AI, Analytics: http://www.oreilly.com/data/free/
* Programming, architecture, Open source culture:
http://www.oreilly.com/programming/free/
* Security: http://www.oreilly.com/security/free/
* Web platform, design: http://www.oreilly.com/web-platform/free/
This is a rather unusual type of email so I wasn't sure I was doing the
right thing so I just sent it to wikitech-l. Please spread the word if you
think it's okay or tell me if you think not. Thanks.
Best
Hi everybody!
As a reminder the CREDIT Showcase is next week on Wednesday,
1-February-2017 (see https://www.mediawiki.org/wiki/CREDIT_showcase for
details). Also, as I mentioned previously we're conducting a survey about
CREDIT. We'd appreciate your feedback! Here is a link to the survey (which
is hosted on a third-party service), and, for information about privacy and
data handling, the survey privacy statement.
https://docs.google.com/a/wikimedia.org/forms/d/e/1FAIpQLSedAtyPfcEhT6OVd26…https://wikimediafoundation.org/wiki/CREDIT_Feedback_Survey_Privacy_Stateme…
.
This email is being sent to several mailing lists in order to reach
multiple audiences. As always, please follow the list link at the very
bottom of this email in case you want to manage your list subscription
options such as digest, unsubscribe, and so on.
And, as usual, if you'd like to share the news about the upcoming CREDIT,
here's some suggested verbiage.
*Hi <FNAME>*
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcase open source tech
projects from Wikimedia’s Community, Reading, Editing, Discovery,
Infrastructure and Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on February 1st at 11am PT / 2pm ET / 19:00 UTC. *
*There’s more info on MediaWiki
<https://www.mediawiki.org/wiki/CREDIT_showcase>, and on Etherpad
<https://etherpad.wikimedia.org/p/CREDIT>, which is where we take notes and
ask questions. You can also ask questions on IRC in the Freenode chatroom
#wikimedia-office (web-based access here
<https://webchat.freenode.net/?channels=%23wikimedia-office>). Links to
video will become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
things.*
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
*YOURNAME*
Thanks!
Adam Baso
Director of Engineering, Reading
Wikimedia Foundation
abaso(a)wikimedia.org
The Parsing team at the Wikimedia Foundation that develops the Parsoid
service is deprecating support for node 0.1x. Parsoid is the service
that powers VisualEditor, Content Translation, and Flow. If you don't
run a MediaWiki install that uses VisualEditor, then this announcement
does not affect you.
Node 0.10 has reached end of life on October 31st, 2016 [1] and node
0.12 is scheduled to reach end of life December 31st, 2016 [1].
Yesterday, we released a 0.6.1 debian package [2] and a 0.6.1 npm
version of Parsoid [3]. This will be the last release that will have
node 0.1x support. We'll continue to provide any necessary critical bug
fixes and security fixes for the 0.6.1 release till March 31st 2017 and
will be completely dropping support for all node versions before node
v4.x starting April 2017.
If you are running a Parsoid service on your wiki and are still using
node 0.1x, please upgrade your node version by April 2017. The Wikimedia
cluster runs node v4.6 right now and will soon be upgraded to node v6.x
[4]. Parsoid has been tested with node 0.1x, node v4.x and node v6.x and
works with all these versions. However, we are dropping support for node
0.1x right away from the master branch of Parsoid. Going forward, the
Parsoid codebase will adopt ES6 features available in node v4.x and
higher which aren't supported in node 0.1x and will constitute a
breaking change.
Subramanya Sastry (Subbu),
Technical Lead and Manager,
Parsing Team,
Wikimedia Foundation.
[1] Node.js Long Term Support schedule @ https://github.com/nodejs/LTS
[2] https://www.mediawiki.org/wiki/Parsoid/Releases
[3] https://www.npmjs.com/package/parsoid
[4] https://phabricator.wikimedia.org/T149331
Hey folks!
This is the 32 - 41st weekly update from the revision scoring team that we
have sent to this mailing list. We've been busy, but our reporting fell
behind. So here I am getting us caught up! This is going to be a long
one. Bear with me.
One major thing we've done in the past few weeks is drafted and presented a
proposal to increase the resourcing for the ORES project in the 2017 Fiscal
Year. Currently, we're just one fully funded staff member (halfak) and
partially funded contractor (Amir1) working with a bunch of volunteers.
We're proposing to staff the team with fulltime engineers, a liaison and a
tech writer. See a full draft of our proposal and pitch deck here:
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Scoring_Platform_team
*New development:*
We've expanded support for our "editquality" models to more wikis and
improved the performance of some of the models.
- We scaled up the number of observations for Indonesian Wikipedia to
100k[1]
- We added language support for Romanian[2] and built the basic
"reverted" model[3]
- We trained and tested "damaging" and "goodfaith" models for Czech
Wikipedia[4]
- We implemented some params in our training utilites to control memory
usage[5]
- We deployed all of the above to Wikimedia Labs[6]. A production
deployment is coming soon.
Prompted by the 2016 community wishlist[7], we've implemented a
"draftquality" model for evaluating new page creations.
- We researched deletion reasons on English Wikipedia[8] and created a
labeled dataset using the deletion log.
- We engineered a set of features to predict the quality of new
articles[9] and built a model[10]
- We generated a set of datasets[11,12,13] to make it easier for
volunteers and external researchers to help us audit the performance of the
model.
- We deployed the model on WMFLabs[14] and announced it's presence to a
few interested patrollers in English Wikipedia
- We've started the process of deploying the model in production[15,16]
We completed a project exploring the use of advance natural-language
processing strategies to extract new signal about vandalism, article
quality and problematic new articles. Regretfully, memory issues prevent
us from trivially putting this into production[17], so we're looking into
alternative strategies[18].
- We implemented a strategy for extracting sentence from Wikitext[19]
- We built sentence banks for personal attacks[20, vandalism[21],
spam[22], and Featured Articles[23].
- We built PCFG-based models[24] and analyzed their ability to
differentiate[25]
We've been working with the Collaboration Team[26] on their Edit Review
Improvments project[27]
- We defined and implemented a set of new precision-based test
statistics that will inform thresholds used in their new user interface[28]
- But we also decided to continue to report recall-based test statistics
as well[29]
Based on advice from engineers on the Collaboration Team, we've begun the
process of converting Wiki labels[30] to a stand-alone tool in labs.
- We generalize the gadget interface so that it can handle all
langauges/wikis[31]
- We implemented a means to auto-configure wikis based on the
dbname[32,33] and that allowed us to simplify configuration[34]
- We also implemented some performance improvements with minification,
bundling[35]
*Labeling:*
In the past few weeks, we've set up labeling campaigns for a few wikis.
- We deployed an edit types campaign for Catalan Wikipedia[36]
- We deployed an edit quality campagin for Chinese[37] and Romanian[38]
Wikipedias
- We deployed a new type of campaign for English Wikipedia --
"discussion quality" asks editors to label talk posts as "toxic" or not[39]
*Maintenance and robustness:*
We've solved a large set of problems with logging issues, compatibility
with wikibase, and we've made minor improvements to performance.
- We addressed a few bugs in the ORES Review Tool[40,44]
- We quieted some errors from our logging in ORES[41,45]
- We updated our code to work with a wikibase schema change[42]
- We fixed a language fallback pattern in Wiki labels[43]
- We set up monitoring on ORES database disk sizes[46]
- We fixed some issues with scap, phabricator's diffusion and other
supporting systems so that we can continue deploying to beta labs[47]
- We split our assets repo so that we can let our WMFLabs deploy get
ahead of the Production deployment[48]
- ORES can now minify its JSON responses[49]
- We identified a bug in flask-assets and worked around it in our local
installation of Wiki labels[50]
*Communications and outreach:*
We had a big presence at the Wikimedia Developer summit, we've drafted a
resourcing proposal, and we've made some announcements about upcoming plans
for the ORES Review tool.
- We facilitated the "Artificial Intelligence to build and navigate
content" track[51]
- We ran a session for building an AI wishlist[52] and captured notes
about more than 20 new AI proposals on a new tag in phabricator[53]
- We also ran a session discussion the ethics and dangers of advanced
algorithms mediating our processes[54]
- We helped facilitate a session about where to surface current AIs in
Wikimedia Projects[55]
- We held a discussion with Legal about licensing labeled data that
comes out of Wiki labels[56] and updated the interface to state the CC0
license clearly[57]
- We worked with the Reading Infrastructure team to analyze the
consumption of "oresscores" through the MediaWiki API[58]
- We drafted a pitch for increasing the resources for our team[59]
- We worked with the Collaboration team to announce that they'll
experimenting with a new RecentChanged filtering strategy in the ORES
Review Tool[60,61]
1. https://phabricator.wikimedia.org/T147107 -- Scale up the number of
observations for idwiki to 100k
2. https://phabricator.wikimedia.org/T152482 -- Add language support for
Romanian
3. https://phabricator.wikimedia.org/T156504 -- Build reverted model for
Romanian Wikipedia
4. https://phabricator.wikimedia.org/T156492 -- Train and test
damaging/goodfaith models for Czech Wikipedia
5. https://phabricator.wikimedia.org/T156645 -- Add '--workers' param to
cv_train utility
6. https://phabricator.wikimedia.org/T154856 -- Clean up dependencies and
deploy newest ORES & Models in labs
7.
https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/M…
8.
https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_…
9. https://phabricator.wikimedia.org/T148580 -- Build feature set for draft
quality model
10. https://phabricator.wikimedia.org/T148038 -- [Epic] Build draft quality
model (spam, vandalism, attack, or OK)
11. https://phabricator.wikimedia.org/T148581 -- Extract features for
deleted page (draft quality model)
12. https://phabricator.wikimedia.org/T156642 -- Generate scored dataset
for 2016-08 - 2017-01
13. https://phabricator.wikimedia.org/T156643 -- Generate extracted
features for 2016-08 - 2017-01
14. https://phabricator.wikimedia.org/T155576 -- Deploy draftquality models
to WMFLabs
15. https://phabricator.wikimedia.org/T156835 -- Create package stuff for
draftquality
16. https://phabricator.wikimedia.org/T157049 -- Create new repo:
research-ores-draftquality
17. https://phabricator.wikimedia.org/T148867#2816566 -- Memory footprint
is enormous!
18. https://phabricator.wikimedia.org/T155111 -- [Spike] Investigate use of
Apertium LTtoolbox API in labs/production
19. https://phabricator.wikimedia.org/T148867 -- Implement sentences
datascources
20. https://phabricator.wikimedia.org/T148035 -- Sentence bank for personal
attacks
21. https://phabricator.wikimedia.org/T148034 -- Sentence bank for vandalism
22. https://phabricator.wikimedia.org/T148032 -- Sentence bank for spam
23. https://phabricator.wikimedia.org/T148033 -- Sentence bank for Featured
Articles
24. https://phabricator.wikimedia.org/T148037 -- Generate PCFG sentence
models
25. https://phabricator.wikimedia.org/T151819 -- Analyze differentiation of
FA, Spam, Vandalism, and Attack models/sentences.
26. https://www.mediawiki.org/wiki/Collaboration
27. https://www.mediawiki.org/wiki/Edit_Review_Improvements
28. https://phabricator.wikimedia.org/T151970 -- Implement new
precision-based test stats for editquality models
29. https://phabricator.wikimedia.org/T156644 -- Restore
recall-threshold-based metrics for editquality models.
30. https://meta.wikimedia.org/wiki/Wiki_labels
31. https://phabricator.wikimedia.org/T151120 -- Generalize standalone
gadget interface
32. https://phabricator.wikimedia.org/T154433 -- Auto config wikilabels
using dbnames
33. https://phabricator.wikimedia.org/T155439 -- Use module loader to load
JS/CSS from wikis
34. https://phabricator.wikimedia.org/T154693 -- Remove host from
wikilabels config -- infer from request
35. https://phabricator.wikimedia.org/T154122 -- Minification and bundling
for wikilabels assets
36. https://phabricator.wikimedia.org/T152965 -- Deploy cawiki edit types
campaign
37. https://phabricator.wikimedia.org/T152561 -- Deploy zhwiki edit quality
campaign
38. https://phabricator.wikimedia.org/T156357 -- Deploy edit quality
campaign for Romanian Wikipedia
39. https://phabricator.wikimedia.org/T156303 -- Deploy "Discussion
quality" campaign in wikilabels
40. https://phabricator.wikimedia.org/T152542 -- Undefined method
ORES\Hooks::getDamagingThreshold()
41. https://phabricator.wikimedia.org/T146681 -- Quiet TimeoutError in
celery logging
42. https://phabricator.wikimedia.org/T154168 -- Quantity changes broke ORES
43. https://phabricator.wikimedia.org/T154897 -- Chinese translations are
not being loaded
44. https://phabricator.wikimedia.org/T155500 -- Fatal exception of type
"DBQueryError" on sorting ORES contributions
45. https://phabricator.wikimedia.org/T157078 -- ores logspam: Model
contains an error
46. https://phabricator.wikimedia.org/T155482 -- Set up monitoring for ORES
redis database
47. https://phabricator.wikimedia.org/T157135 -- Fix broken beta-labs deploy
48. https://phabricator.wikimedia.org/T154436 -- Split wheels repo into
Prod/WMFLabs branches and maintain independence
49. https://phabricator.wikimedia.org/T155931 -- Minify json responses
50. https://phabricator.wikimedia.org/T154865 -- assets url return empty
string
51. https://phabricator.wikimedia.org/T147708 -- Artificial Intelligence to
build and navigate content
52. https://phabricator.wikimedia.org/T147710 -- What should an AI do you
for you? Building an AI Wishlist.
53. https://phabricator.wikimedia.org/tag/artificial-intelligence/
54. https://phabricator.wikimedia.org/T147929 -- Algorithmic dangers and
transparency -- Best practices
55. https://phabricator.wikimedia.org/T148690 -- Where to surface AI in
Wikimedia Projects
56. https://phabricator.wikimedia.org/T145024 -- Licensing of labeled data
57. https://phabricator.wikimedia.org/T156052 -- Add notice of CC0 status
of Wikilabels data to UI & Docs
58. https://phabricator.wikimedia.org/T156273 -- Identify baseline api.php
Action API consumption
59. https://phabricator.wikimedia.org/T157470 -- Draft proposal/pitch for
ORES resourcing
60. https://phabricator.wikimedia.org/T150855 -- Gather assets for post
about ORES review tool including ERI filters
61. https://phabricator.wikimedia.org/T150858 -- Post about ORES review
tool including ERI filters
Sincerely,
Aaron from the Revision Scoring Scoring Platform team