Wikitech-l February 2019

wikitech-l@lists.wikimedia.org

56 participants
63 discussions

Wednesday Feb 27: Technical Advice IRC Meeting
by Johanna Strodt 26 Feb '19

26 Feb '19

Reminder: Technical Advice IRC meeting this week **(Wednesday) 4-5 pm UTC** on #wikimedia-tech. Question can be asked in English, Uzbek, Korean & German. The Technical Advice IRC Meeting is a weekly support event for volunteer developers. Every Wednesday, two full-time developers are available to help you with all your questions about Mediawiki, gadgets, tools and more! This can be anything from "how to get started" over "who would be the best contact for X" to specific questions on your project. If you know already what you would like to discuss or ask, please add your topic to the next meeting: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting Hope to see you there! Johanna (for the Technical Advice IRC meeting crew)

1 0

Discovery Weekly Update for the week starting 2019-02-18
by Chris Koerner 25 Feb '19

25 Feb '19

Hello, This is the weekly update from the Search Platform team for the week starting 2019-02-18. As always, feedback and questions are welcome. == Discussions == === Search === * A new Korean language analyzer has been configured for Korean-language wikis,[0] however it won't be activated until after we finish the upgrade to Elasticsearch 6, which is ongoing. * SDC [Structured Data on Commons] wanted to know if we could add in a 'inlabel search keyword' and after lots of discussion, it was merged into the new WikibaseCirrusSearch extension that has yet to be merged into the beta cluster [1] * Erik and the team worked on how to measure mutation latency across the newly split elasticsearch clusters and decided that default timeout was good at 30 seconds [2] * Mathew and Gehel worked on testing the spicerack elasticsearch module with quite a few patches that are linked in the ticket [3] * Gehel worked on getting CI set up for search/glent (maven project) to be set up with same options that we use for search/extra [4] * A bug was found where a link-breaking typo is in automatic API documentation for action=query&prop=cirrusbuilddoc, and Erik fixed it by correcting the api docs for cirrusbuilddoc [5] * As we now have different APT components to differentiate the elasticsearch versions, we need to create a new component for the new version and Gehel fixed it all up [6] * David worked on preparing a debian package with search plugins compatible with elastic 5.6.14 in which Gehel merged [7] * Davis also did quite a bit of work to fix and add integration tests for several language analyzers [8] * Erik worked on updating the ttmserver for elasticsearch 6 and removed elastic 2.x compatibility [9] == Did you know? == Grammatical gender [10] often confuses speakers of English and other languages without a similar system. “Why is a bridge feminine in German (Brücke [11]) and masculine in Spanish and French (puente [12] & pont [13])?” they ask—though usually without links to Wiktionary. Grammatical gender is really just a system of noun classes [14] where there are two or three classes, and most things classified as male or female end up in different classes. Other languages have noun classes based on whether or not the nouns are animate, whether they are human or animal, by shape, and sometimes just arbitrarily groupings; languages can have nearly two dozen noun classes, like some of the Niger–Congo languages![15] Now hold on while we veer off on a brief tangent: diminutives are words that convey a smaller, lesser, or more intimate sense of their root form.[16] They are common in American nicknames, often showing up as a -y or -ie ending (Billy vs. Bill, Peggy vs Peg, Bobbie vs Roberta). Sometimes diminutives, especially when applied to small cute things, can become the main or only form of a word. For example, English baby [17] from babe, or kitty from kit. Diminutives and grammatical gender collide in German Mädchen [18] (“girl”) which is historically from Magd (cognate with English “maid”) plus the diminutive suffix -chen; all diminutives formed with -chen have neuter gender in German. Over time, Mädchen became the predominate term for a girl, despite the fact that the word is grammatically “neuter”. [0] https://phabricator.wikimedia.org/T206874 [1] https://phabricator.wikimedia.org/T215967 [2] https://phabricator.wikimedia.org/T215969 [3] https://phabricator.wikimedia.org/T207920 [4] https://phabricator.wikimedia.org/T216599 [5] https://phabricator.wikimedia.org/T216256 [6] https://phabricator.wikimedia.org/T216047 [7] https://phabricator.wikimedia.org/T215932 [8] https://phabricator.wikimedia.org/T215594 [9] https://phabricator.wikimedia.org/T192680 [10] https://en.wikipedia.org/wiki/Grammatical_gender [11] https://en.wiktionary.org/wiki/Br%C3%BCcke#German [12] https://en.wiktionary.org/wiki/puente#Spanish [13] https://en.wiktionary.org/wiki/pont#French [14] https://en.wikipedia.org/wiki/Noun_class [15] https://en.wikipedia.org/wiki/Noun_class#Niger%E2%80%93Congo_languages [16] https://en.wikipedia.org/wiki/Diminutive [17] https://en.wiktionary.org/wiki/baby#Etymology [18] https://en.wiktionary.org/wiki/M%C3%A4dchen#Etymology ---- Subscribe to receive on-wiki (or opt-in email) notifications of the Discovery weekly update. https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly The archive of all past updates can be found on MediaWiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator. [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R Yours, Chris Koerner (he/him) Community Relations Specialist Wikimedia Foundation

1 0

TechCom Radar 2019-02-20
by Kate Chapman 25 Feb '19

25 Feb '19

Hi All, Here are the minutes from last week's TechCom meeting: * Approved: https://phabricator.wikimedia.org/T190379 RFC: Re-establish the development policies * Approved: https://phabricator.wikimedia.org/T213318 Wikibase Front-End Architecture. * On Last Call ending March 6 1pm PST(21:00 UTC, 22:00 CET) RfC: Standards for external services in the Wikimedia infrastructure.https://phabricator.wikimedia.org/T208524 * Last Call ending 11pm PST (February 27 07:00 UTC, 08:00 CET) on 27 February 2019 RFC: Update to Gerrit privilege policy https://phabricator.wikimedia.org/T216295 * No IRC meeting week of February 25 * Was no IRC meeting week of February 18 You can also find our meeting minutes at <https://www.mediawiki.org/wiki/Wikimedia_Technical_Committee/Minutes> See also the TechCom RFC board <https://phabricator.wikimedia.org/tag/mediawiki-rfcs/>. If you prefer you can subscribe to our newsletter here <https://www.mediawiki.org/wiki/Newsletter:TechCom_Radar> Thanks, Kate -- Kate Chapman Senior Program Manager, Core Platform Wikimedia Foundation kchapman(a)wikimedia.org

1 0

Sunsetting mwSnapshots
by Krinkle 24 Feb '19

24 Feb '19

TL;DR: I've decided to sunset my "snapshots" tool. The Snapshots tool created TAR-archives of MediaWiki core branches, fresh from Gerrit, every hour. I created it in 2012 on the Toolserver,[1] to make it easier for site admins to try out the alpha version of MediaWiki (or a WMF branch), using the same format as our official stable releases. The snapshots were generated using a PHP script and git-cli commands, scheduled with a cronjob onto the Toolforge Grid. [2] [3] ### *Lessons* learned Maintaining this tool has mostly been an exercise in learning how hard it is to keep a Git repository functional over a large period of time. I learned about the numerous ways that a Git repository can become corrupted or unusable when commands are terminated in unforeseen ways. For example, what happens to the state of a Git repository when it's a clone of MediaWiki core, put on NFS, and you try to switch from current master to a release branch from a decade ago – while lots of users are also working on that same NFS mount, and do this for every branch – every hour? [4] I learned how poor Git can be at forgetting which files are part of a branch and which aren't, so that even when there aren't any errors, if you switch to an old branch with directories or submodules that a newer branch doesn't have and then switch back, old files could stay and be seen as untracked files. These then cause failed checkouts later on due to conflicting changes when another branch does have the file in question. (This was improved a lot around 2015 with later releases of Git 2.x.) I learned that, apparently, when Git's garbage collector kicks in from time to time, it doesn't know how much memory it is allowed to use, and will "efficiently" use larger and larger blocks to speed up the process until it gets killed by the grid engine, at which point it will eventually start again and make the same mistake, until someone comes in and manually runs git-gc outside the grid. (This was initially worked around by disabling git-gc. I later found a way to re-enable with more constrained settings, see [5]). ### *Sunsetting* The tool hasn't seen much use (to my knowledge) – apart from web crawlers of search engines. I haven't heard much complaining over the years (if at all) whenever it got stuck for long periods at a time. Last week, I noticed it once again got stuck, and apparently had been for several months. Rather than fixing it, I decided to shut it down this time. The source code is available on GitHub for anyone interested in picking it up again.[3] My recommendation would be to *not* try to maintain a local Git repository like I did. Instead, have everything be ephemeral. That is, whenever you run the script, create for each branch you're interested in, a temporary clone with limited depth and just that branch, then create an archive and get rid of the clone (also before beginning, in case something was left behind). This will make it a bit slower, and less elegant, but presumably much more stable. Actually, given how slow branch switches can be, it might even be faster! There is also support in newer versions of Git to invoke git-archive directly on a remote URL, which would remove the need for local clones entirely. [6] ### *Recipe* for (mostly stable) creation of tarballs from Git My final hourly recipe looked like this: 1. Blindly delete any ".git/index.lock" file. 2. Run "git clean -q -d -x -f'", deletes unknown files with extra force. 3. Run "git reset -q --hard", deletes any locally staged state, with extra force. (While nothing does any staging in this script, Git would sometimes magically think a file was staged. If I recall correctly, this related to extension submodules.) 4. Get name of remote. 5. Run "git fetch origin". 6. Run "git remote prune origin". 7. Get list of branches (then filter by pattern). Then, for each branch: 8. Get $head of tree for branch via "git rev-parse --verify $branch". 9. Check if you've already got an archive for that. If so, continue with the next branch instead. If not, go on: 10. Repeat steps 1-3 to reset the repo. [4] 11. Run "git checkout -q -f $branch", checks out the branch, with extra force. 12. Run "git rev-parse --verify HEAD", and confirm it matches $head, because sometimes checkout command succeeded, but not really. 13. Run "git archive HEAD --format='tar' | gzip > mediawiki-$branch-$ head.tar.tz", which creates the actual archive. 14. If for "master" branch, update the Mediawiki-latest.tar.gz symlink. 15. (End of for-each branch). Delete older tar files for branches that we created a new one for just now. It lived at https://tools.wmflabs.org/snapshots, which now redirects to https://www.mediawiki.org/wiki/Snapshots instead. Best, -- Krinkle [1] Toolserver. – https://en.wikipedia.org/wiki/Wikipedia:Toolserver [2] Toolforge Grid. – https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid [3] The script. – https://github.com/Krinkle/mw-tool-snapshots/blob/b33d479cb9/scripts/update… [4] The reset. – https://github.com/Krinkle/toollabs-base/blob/v1.0.2/src/GlobalFunctions.ph… [5] More about Git GC and memory configuration management. – https://github.com/Krinkle/mw-tool-snapshots#git-memory [6] Use git-archive on a remote repo, without a local clone. – https://git-scm.com/docs/git-archive/2.18.0#git-archive---remoteltrepogt

1 0

Upgrading phan
by Kunal Mehta 24 Feb '19

24 Feb '19

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hi, It's time to work on upgrading phan, the PHP static analysis tool we use! The version of phan we were using was released over two years ago (0.8.0) - we're moving to 1.2.4 that was released less than a week ago. And the CI infrastructure is now in place to facilitate easy upgrades in the future. The new version comes with lots of upstream bug fixes and feature requests. We're also able to get rid of our custom wrapper scripts that hacked around limitations in phan as well. I've filed bugs for every extension in Gerrit to upgrade phan[1], including links to migration steps. For some extensions the upgrade will be trivial, but for others it will be rather involved. I do expect that we will likely find some bugs or missing features in phan, but now we should be able to file upstream issues in those cases since we're no longer running a super out of date version. In the second extension I ran it against, it was able to find actual bugs (e.g. [2]). So I'm pretty hopeful that this will be an overall improvement. The CI tutorial[3] has also been updated for the new-style phan setup. Let me know if you have any questions/comments/etc. [1] https://phabricator.wikimedia.org/maniphest/query/KxjdNDM65iNM/ [2] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/4925 27 [3] https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Add_phan _to_a_MediaWiki_extension - -- Legoktm -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEE2MtZ8F27ngU4xIGd8QX4EBsFJpsFAlxyUCIACgkQ8QX4EBsF JptcIQ//TmSWriBhI1tBJVELo9Cqb+vuSyUMVFG7X+dVIYxkF6pyci1NiTuIUfxx +U+LYxjo+9tVBez4UXnIuyuV8JK6EdhBqpaMHqnMNJnzA11/hL7xc8Dc5hm0Oc2j WIUwrfEyolO/HzlYtRAXRw+q9uDVoMot5IvG0rLbOgeMaDqsuhihfhtOqxbxIjzI UVY4IpiHQQlJpx5MS80AffxkFtxcs/E+oLrHOAYRut76PWcvoX/q2pO4uzLV/YK+ aQov+fg9/6JQLn9wCEZjh0Vn/KRobJPntOMErJQoGzqdBx1ysqDQ/3aqWqgp2w+E QV79QxjH5SVLV41pumkH5mf9ADm9b61ocNXP3c7SY7TO9FyqEwrn6qI5DVndC82B CriWWLrFYL5JSb4/7ORJ+G/8QPk8AT9Z/3iW2qmdYu4tGnNNqBcP+k2yNwY0xazG sRgn2BsgpNhGZmqCdcnlHtBq8Dn0BgtBGylz90H03MNOol5ABfqqTW60esNWmBnZ /ZV67nS0B9bUFk6FYx8DP99+8jO/fBeDXaQvA29e2F4VGAR9w/pI3uCteTbmY0LU DJx2nAwyOJlcJ9CfScFYUc3BY9dOuewAQJxzZwLrzh8QXRjhk5XMrAV+S+BKcFxG U63HVxVKeqlLANQErhyPViPYI1cXAwUwKhgwvk3reauz8kJw3vY= =JfvQ -----END PGP SIGNATURE-----

1 0

2. Re: New Security Reviews Process (Pine W)
by Charlotte Portero 21 Feb '19

21 Feb '19

Pine, Thank you for the suggestion. I am proposing it to the team. Charlotte

1 0

Invitation to a "Wikimedia Café" casual online meetup
by Pine W 20 Feb '19

20 Feb '19

Hi folks, Based on comments that I received on Wikimedia-l, I would like to invite people to a casual online meetup one hour before the monthly WMF Metrics and Activities Meeting. There will be no set agenda. You can come with questions or ideas that you would like to discuss. Please be willing to listen to questions and ideas from other Wikimedians. I will host the meeting with the Zoom software. You can join with software or by using your phone. If you join by phone then your phone number will be visible to other participants. The primary language of the meeting will be English, but if people would like to communicate in diverse languages then that is okay too. We can facilitate translation by text chat. Many Wikimedians, myself included, are multilingual in varying degrees, so we might try to have live interpretation also. Here is information about how to connect: Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/136978210 Or iPhone one-tap : Argentina: +543415122188,,136978210# Or Telephone: Dial (for higher quality, dial a number based on your current location): Argentina: +54 341 512 2188 Australia: +61 (0) 2 8015 2088 or +61 (0) 8 7150 1149 Canada: +1 647 558 0588 Hong Kong, China: +852 5808 6088 France: +33 (0) 1 8288 0188 or +33 (0) 7 5678 4048 Germany: +49 (0) 30 3080 6188 or +49 (0) 30 5679 5800 Israel: +972 (0) 3 978 6688 Italy: +39 069 480 6488 Japan: +81 (0) 3 4578 1488 or +81 524 564 439 Mexico: +52 229 910 0061 or +52 554 161 4288 Spain: +34 84 368 5025 or +34 91 198 0188 Sweden: +46 (0) 7 6692 0434 or +46 (0) 8 4468 2488 Russia: +7 495 283 9788 United Kingdom: +44 (0) 20 3051 2874 or +44 (0) 20 3695 0088 US: +1 408 638 0986 or +1 646 558 8665 Meeting ID: 136 978 210 International numbers available: https://zoom.us/u/ekaPibJIy The first "Wikimedia Café" meetup will be on 30 August 2018, at 17:00 UTC / 10:00 Pacific. Let me emphasize that the environment won't be like this <https://en.wikipedia.org/wiki/File:West_Hartford,_Connecticut_health_care_r…>, so please don't feel intimated if you are nervous about public speaking. (If a conversation feels to me like it is becoming uncivil or intimidating, then I will ask the debaters to quiet themselves or to move to somewhere else.) The meeting will generally have an environment that is more like this <https://en.wikipedia.org/wiki/File:Caf%C3%A9_M%C3%A9lange,_Wien.jpg> or this <https://en.wikipedia.org/wiki/File:Takamatsu-Castle-Building-Interior-M3488…>. I anticipate that few people will come, which is okay. I hope that if you come then you will enjoy the environment and conversation. Until next time, Pine ( https://meta.wikimedia.org/wiki/User:Pine )

1 9

Discovery Weekly Update for the week starting 2019-02-11
by Chris Koerner 20 Feb '19

20 Feb '19

Hello again, This is the weekly update from the Search Platform team for the week starting 2019-02-11. As always, feedback and questions welcome. == Discussions == === Search === * Stas and Trey worked on creating a textcat package to deploy [0] * Mathew and Gehel collaborated on creating an Icinga check for failed shard allocation [1] * Search, SRE, and WMCS created a cloudelastic-root group that refines certain access to the search clusters [2] * Erik ran Wikidata entity autocomplete AB test on de, fr, es wikis. The testing proved to be good, and the new wbsearchentities profiles have been deployed [3] * Erik worked to create a metastore if it is missing from indexNamespaces.php (installs were failing while running updateSearchIndexConfig.php) [4] * David reworked how source_regex timeout is done in Cirrus (to limit the source_regex query from consuming all the cluster resources) [5] * David also confirmed that the ApiFeatureUsage still works with ElasticSearch 6.5.4 [6] * Erik noted that as production search indicies are now split across three clusters per datacenter, mwgrep needs to be able to query multiple ElasticSearch clusters [7] * Erik ensured that the mjolnir daemons will work seamlessly with ElasticSearch 5 or 6 [8] * Trey and David ensured that the Elastic language analysis components, our internal components, and third-party components are all working as expected in ElasticSearch 6 [9] [0] https://phabricator.wikimedia.org/T213936 [1] https://phabricator.wikimedia.org/T212850 [2] https://phabricator.wikimedia.org/T214922 [3] https://phabricator.wikimedia.org/T214515 [4] https://phabricator.wikimedia.org/T215369 [5] https://phabricator.wikimedia.org/T198734 [6] https://phabricator.wikimedia.org/T215621 [7] https://phabricator.wikimedia.org/T215199 [8] https://phabricator.wikimedia.org/T215475 [9] https://phabricator.wikimedia.org/T194849 ---- Subscribe to receive on-wiki (or opt-in email) notifications of the Discovery weekly update. https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly The archive of all past updates can be found on MediaWiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator. [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R Yours, Chris Koerner (he/him) Community Relations Specialist Wikimedia Foundation

1 0

New Security Reviews Process
by Charlotte Portero 20 Feb '19

20 Feb '19

I am pleased to announce that the Security Team has a new Security Reviews process. "Wikimedia Security Team/Standard Operating Procedure/Security Readiness Reviews" [0] replaces the former "Wikimedia Security Team - Security Reviews" [1]. Please note: one of the new requirements is that there is *30-day* pre-deployment date submission. If you have any questions after reviewing the Standard Operating Procedure (SOP), please contact the Security Team at security-team(a)wikimedia.org. Thank you, Charlotte Portero Project Manager, Security [0] https://www.mediawiki.org/wiki/Wikimedia_Security_Team/Standard_Operating_P… Security_Readiness_Reviews [1] https://www.mediawiki.org/w/index.php?title=Wikimedia_Security_Team/Securit…

4 5

[Train] 1.33.0-wmf.18 status update
by Tyler Cipriani 20 Feb '19

20 Feb '19

Hello all! I have not yet started the 1.33.0-wmf.18 train; however, at the end of last week, I noticed some errors that (AFAICT) are regressions in 1.33.0-wmf.17. There were two new errors that started showing up in 1.33.0-wmf.17: 1. ErrorException from includes/HeaderCallback.php: PHP Notice: Undefined offset: 1[0] 2. includes/specials/pagers/ActiveUsersPager.php: PHP Notice: Undefined index: dir[1] Neither of these errors were happening at a high enough rate, or with enough of a user impact to trigger a rollback of 1.33.0-wmf.17; however, I added them as blockers for wmf.18 in the hopes that we could address regressions caused from wmf.17 before rolling out wmf.18. If folks could take a look at these tasks and help me resolve these regressions before we start rollout of a new version that'd be great! Thanks in advance for your help and attention! -- Tyler [0]. <https://phabricator.wikimedia.org/T216086> [1]. <https://phabricator.wikimedia.org/T216200>

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2019