Wikitech-l May 2016

wikitech-l@lists.wikimedia.org

113 participants
119 discussions

Today's RFC session: Overhaul Interwiki map, unify with Sites and WikiMap
by Daniel Kinzler 11 May '16

11 May '16

Tonights RFC session on #wikimedia-office will be about "Overhaul Interwiki map, unify with Sites and WikiMap", see https://phabricator.wikimedia.org/E171 https://phabricator.wikimedia.org/T113034 The meeting will start at 21:00 UTC <https://www.timeanddate.com/worldclock/ fixedtime.html?msg=RFC+session&iso=20160511T21&p1=1440&ah=1> We have talked about overhauling interwiki before. Today, I would like to revisit the topic, look at the current state of things, and discuss next steps and open questions: Status ------- * Please review: //factor storage logic out of Interwiki// <https:// gerrit.wikimedia.org/r/#/c/250150/> (I7d7424345) Next Steps ------------ * split CDB from SQL implementation * implement array-based InterwikiLookup (loads from multiple JSON or PHP files) * indexes should be generated on the fly, if not present in the loaded data * proposed structure: P3044 * that InterwikiLookup implementation should also implement SiteLookup. Alternatively, only implement SiteLookup, and provide an adapter (SiteLookupInterwikiLookup) that implements InterwikiLookup on top of a SiteLookup. * implement maintenance script that can convert between different interwiki representations. * use InterwikiLookup for (multipke) input sources (db/files), InterwikiStore for output * we want an InterwikiStore that can write the new array structure (as JSON or PHP) * we want an InterwikiStore that can write the old CDB structure (as CDB or PHP) * Provide a config variable for specifying which files to read interwiki info from. If not set, use old settings and old interwiki storage. Questions ----------- * is this a good plan? (see below for rationale) * how does interwiki/site info relate to local wiki config (wgConf/SiteMatrix/ WikiMap)? * should all information always be loaded? (see also {T114772}) * do we need caching? * do we need to support new features also for the SQL based InterwikiLookup? * needs: interwiki_ids table, interwiki_groups table, and blob field with JSON or an interwiki_props table. * Should SiteMatrix continue to work based on wgConf, or should it be ported to use Sites? Or combine both? Currently it has [[https:// gerrit.wikimedia.org/r/#/c/211119/|problems]] with Wikimedia-specific configurations, e.g. for [[https://meta.wikimedia.org/wiki/ Special_language_codes|special language codes]]. Later ------- * decide on how wikis on the WMF cluster should load their interwiki config * proposal: three files: family (shared by e.g. all wikipedias), language (shared by e.g. all english wikis), and local. * create a script that generates the family, language, and local files for all the wikis (as JSON or PHP) based on config. Should work like dumpInterwiki. * check this: generating CDB based on the relevant family/language/local file for a given wiki should return the same CDB as dumpInterwiki for that site. * create a deployment process that generates PHP files from the checked-in JSON files, for faster loading. * action=siteinfo&siprop=interwikimap could be ported to Sites and expose more information. Distinction from SiteMatrix is becoming somewhat unclear then. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

1 0

"Among mobile sites, Wikipedia reigns in terms of popularity"
by Tilman Bayer 11 May '16

11 May '16

New study (US only) by the Knight Foundation: https://medium.com/mobile-first-news-how-people-use-smartphones-to , summarized here: http://www.theatlantic.com/technology/archive/2016/05/people-love-wikipedia… "People spent more time on Wikipedia’s mobile site than any other news or information site in Knight’s analysis, about 13 minutes per month for the average visitor. CNN wasn’t too far behind, at 9 minutes 45 seconds per month. BuzzFeed clocked in third at 9 minutes 21 seconds per month. (BuzzFeed, however, slays both CNN and Wikipedia in time spent with the sites’ apps, compared with mobile websites. BuzzFeed users devote more than 2 hours per month to its apps, compared with about 46 minutes among CNN app users and 31 minutes among Wikipedia app loyalists.) Another way to look at Wikipedia’s influence: Wikipedia reaches almost one-third of the total mobile population each month, according to Knight’s analysis, which used data from the audience-tracking firm Nielsen." -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB

1 0

WatchedItemStore - What, Why & How
by Addshore 11 May '16

11 May '16

Over the past few months the TCB team at WMDE has been working on re-factoring code in core surrounding watchlists. You can find a full blog post about what we did, why we did it and how we did it at the link below: http://addshore.com/2016/05/refactoring-around-watcheditem-in-mediawiki tl;dr This was work put into making introducing expiring watchlist entries easier. Code was removed from various API modules, special pages and other random places. The new extracted code now has basically 100% unit & integration test coverage. -- Addshore

3 2

Upload tools configuration changes on testwikis
by Bartosz Dziewoński 10 May '16

10 May '16

Hi, I had two changes deployed today that change how UploadWizard and the upload dialog are configured on testwiki and test2wiki. I don't think this affects anybody but us at Multimedia, but just in case: * test2wiki no longer has UploadWizard [1] enabled. The configuration has always been somewhat broken and hardly anyone used it. It's still enabled on testwiki. [2] * test2wiki can now upload files cross-wiki to testwiki using the upload dialog [3]. Upload dialog uploads from testwiki now also go to testwiki (locally). Previously they both uploaded to Commons, like production wikis. [4] You can think of these as testwiki acting more like Commons and test2wiki acting more like a Wikipedia site. Have fun testing! [1] https://www.mediawiki.org/wiki/UploadWizard [2] https://gerrit.wikimedia.org/r/287944 [3] https://www.mediawiki.org/wiki/Upload_dialog [4] https://gerrit.wikimedia.org/r/285708 -- Bartosz Dziewoński

1 0

Re: [Wikitech-l] [Mediawiki-i18n] Providing the effective language of messages
by Niklas Laxström 10 May '16

10 May '16

2016-04-12 14:01 GMT+03:00 Adrian Heine <adrian.heine(a)wikimedia.de>: > Hi everyone, > > as some of you might know, I'm a software developer at Wikimedia > Deutschland, working on Wikidata. I'm currently focusing on improving > Wikidata's support for languages we as a team are not using on a daily > basis. As part of my work I stumbled over a shortcoming in MediaWiki's > message system that – as far as I see it – prevents me from doing the right > thing(tm). I'm asking you to verify that the issue I see indeed is an issue > and that we want to fix it. Subsequently, I'm interested in hearing your > plans or goals for MediaWiki's message system so that I can align my > implementation with them. Finally, I am hoping to find someone who is > willing to help me fix it. First of all, thanks for working on this issue. It is a real issue, but not often requested. I think that is because manually checking in every place whether the language code is unexpected (different from the one in current context) would be cumbersome and always outputting language codes on every tag would be bloaty. Best would be if this checking was automated in a templating library, but so far templating hasn't been much adopted in MediaWiki core. But of course this information needs to be exposed first, which is what I understand you are doing. > == The issue == > > On Wikidata, we regularly have content in different languages on the same > page. We use the HTML lang and dir attributes accordingly. For example, we > have a table with terms for an entity in different languages. For missing > terms, we would display a message in the UI language within this table. The > corresponding HTML (simplified) might look like this: > > <div id="mw-content-text" lang="UILANG" dir="UILANG_DIR"> > <table class="entity-terms"> > <tr class="entity-terms-for-OTHERLANG1" lang="OTHERLANG1" > dir="OTHERLANG1_DIR"> > <td class="entity-terms-for-OTHERLANG1-label"> > <div class="wb-empty" lang="UILANG" dir="UILANG_DIR"> >  > </div> > </td> > </tr> > </div> > </div> > > This works great as long as the missing label message is available in the UI > language. If that is not the case, though, the message is translated > according to the defined language fallbacks. In that case, we might end up > with something like this: > > <div class="wb-empty" lang="arc" dir="rtl">No label defined</div> > > That's obviously wrong, and I'd like to fix it. > > == Fixing it == > > For fixing this, I tried to make MessageCache provide the language a message > was taken from [1]. That's not too straight-forward to begin with, but while > working on it I realized that MessageCache is only responsible for following > the language fallback chain for database translations. For file-based > translations, the fallbacks are directly merged in by LocalisationCache, so > the information is not there anymore at the time of translating a message. I > see some ways to fix this: > > * Don't merge messages in LocalisationCache, but perform the fallback on > request (possibly caching the result) > * Tag message strings in LocalisationCache with the language they are in > (sounds expensive to me) > * Tag message strings as being a fallback in LocalisationCache (that way we > could follow the fallback until we find a language in which the message > string is not tagged as being a fallback) > > What do you think? The current localisation cache implementation quite obviously trades space for speed. In this light I would suggest option two, to tag the actual language the string is in. However, this trade-off might not make sense anymore, as we have more languages and more messages, resulting in almost gigabyte size caches. See also for example https://phabricator.wikimedia.org/T99740. I added wikitech-l to CC in hopes that people who have worked on localisation cache more recently would comment on whether option one, to not merge messages, would make more sense nowadays. > > [1] https://gerrit.wikimedia.org/r/282133 > -Niklas

3 2

Revision Scoring weekly update
by Aaron Halfaker 10 May '16

10 May '16

Hey folks, This is the weekly update for the Revision Scoring project for the week of April 2nd through April 8th. *New developments:* - Solved some issues that block a major performance improvement for score requests using multiple models[2] - Improved the performance of feature extraction for features that use mwparserfromhell[3,4] - We applied regex performance optimizations to badwords and informal word detection for many languages[9] *Maintenance and robustness:* - Solved a regression in ScoredRevisions that caused most revisions in RecentChanges to not be scored[1] - Set ORES load balancer to rebalance on 500 responses from a web node[5] - Enabled CORS for error responses from ORES -- this makes it easier to report errors from a gadget on a wiki[6] - Sade the staging instance of Wikilabels[7] look a lot more like the production instance[8] 1. https://phabricator.wikimedia.org/T134601 2. https://phabricator.wikimedia.org/T134781 3. https://mwparserfromhell.readthedocs.io 4. https://phabricator.wikimedia.org/T134780 5. https://phabricator.wikimedia.org/T111806 6. https://phabricator.wikimedia.org/T119325 7. https://labels-staging.wmflabs.org/gadget/ 8. https://phabricator.wikimedia.org/T134627 9. https://phabricator.wikimedia.org/T134267 Stay tuned! --Aaron

1 0

ResourceLoader addModuleStyles() issues
by Krinkle 10 May '16

10 May '16

TL;DR: The current addModuleStyles() method no longer meets our requirements. This mismatch causes bugs (e.g. user styles load twice). The current logic also doesn't support dynamic modules depending on style modules. I'm looking for feedback on how to best address these issues. ResourceLoader is designed with two module types: Page style modules and dynamic modules. Page style modules are part of the critical rendering path and should work without JavaScript being enabled or supported. Their URL must be referenced directly in the HTML to allow browsers to discover them statically for optimal performance. As such, this URL can't use dynamic information from the startup module (no dependencies[1], no version hashes). Dynamic modules have their names listed in the page HTML. Their dependencies and version hashes are resolved at run-time by JavaScript via the startup manifest. We then generate the urls and request them from the server. There is also a layer of object caching in-between (localStorage) which often optimises the module request to not involve an HTTP request at all. Below I explain the two issues, followed by a proposed solution. ## Dependency Historically there was no overlap between these two kinds of modules. Page style modules were mostly dominated by the skin and apply to skin-specific server-generated html. Dynamic modules (skin agnostic) would make their own DOM and apply their own styles. Now that we're reusing styles more, we're starting seeing to see overlap. In the past we used jQuery UI - its styles never applied to PHP-generated output. Now, with OOUI, its styles do apply to both server-side and client-side generated elements. Its styles are preloaded as a page style module on pages that contain OOUI widgets. The OOjs UI bundle (and its additional styles) shouldn't need to know whether the current page already got some of the relevant styles. This is what dependencies are for. For OOUI specifically, we currently have a hardcoded work around that adds a hidden marker to pages where OOUI is preloaded. The OOUI style module has a skipFunction that forgoes loading if the marker is present. ## Implicit type Aside from the need for dependencies between dynamic and page style modules. There is another bug related to this. We don't currently require modules to say whether they are a dynamic module or a page style module. In most cases this doesn't matter since a developer would only load it one way and the module type is implied. For example, if you only ever pass a module to addModules() or mw.loader() it is effectively a dynamic module. If you only ever pass a module to addModuleStyles() loaded via a <link rel=stylesheet> then it is a page style module. A problem happens if one tries to load the same module both ways. This might seem odd (and it is), but happens unintentionally right now with wiki modules (specifically, user/site modules and gadgets). For user/site modules, we don't know whether common.css relates to the page content, or whether it relates to dynamic content produced by common.js. The same for gadgets. A gadget may produce an AJAX interface and register styles for it, or the gadget may be styles-only and intended as a skin customisation. Right now the Gadgets extension works around this problem with a compromise: Load it both ways. First it loads all gadgets as page style modules (ignoring the script portion), and then it loads the same modules again as dynamic modules. Thus loading the styles twice. ## Proposed solution In order to allow dependency relationships between a dynamic module and a page style module, we need to inform mw.loader in JavaScript about which modules have already been loaded by different means. We do this already with the user modules (by setting mw.loader.state directly). This would work but means that if you then load the same module again as dynamic module, it will assume the module is already loaded and thus never deliver the script portion (for the case where the gadget wasn't meant to be a page style module). Similarly, it would mean that common.js wouldn't get delivered if common.css exists. For user/site modules I propose to solve this by splitting them up into 'user', 'user.styles', 'site' and 'site.styles'. Existing dependency relationships between other modules and 'user' or 'site' will continue to work. It'd mostly be an internal detail. This allows us to load one as a page style module and the other dynamically. For gadgets we can try to infer the intent (styles-only = page style module, both = dynamic module), with perhaps a way to declare the desired load method explicitly in Gadgets-definition if the default is wrong. With that resolved, we can export mw.loader.state information for page style modules, which then allows dynamic modules to depend on them. Thoughts? https://phabricator.wikimedia.org/T87871 https://phabricator.wikimedia.org/T92459 -- Krinkle [1] If one would allow page style modules to have dependencies and resolve them server-side in the HTML output, this would cause corruption when the relationship between two modules changes as existing pages would have the old relationship cached but do get the latest content from the server. Adding versions wouldn't help since the server can't feasibly have access to previous versions (too many page/skin/language combinations).

5 13

historical trivia: who first picked UTC as Wikimedia time, when and why?
by David Gerard 10 May '16

10 May '16

Question about obscure historical detail: Who picked UTC as Wikimedia time? When was this, and what was the thought process? (the answer is almost certainly "Brion or Jimbo, early 2001, it's the obvious choice", but I'm just curious as to details.) - d.

6 5

MediaWiki codesniffer updates on extensions
by Niklas Laxström 10 May '16

10 May '16

I noticed that lots of extensions were recently updated to use mediawiki-codesniffer 0.7.1 [1]. This has some implications I think are worth being aware of: 1) Many open patches will need manual rebasing and conflict resolution. 2) Those extensions will now depend on at least PHP 5.4 due to short array syntax [2] instead of PHP 5.3. Both can cause inconveniences, (1) mainly for developers and (2) mostly for users. Some patches doing this update in a more thorough and tested way [3] existed but were ignored. It would be nice to avoid that in the future. Please also watch out for code that might have got removed unexpectedly due to a bug [4]. On quick inspection I did not see this happening in those patches. [1] https://gerrit.wikimedia.org/r/#/q/status:merged+topic:bump-dev-deps,n,z [2] https://secure.php.net/manual/en/migration54.new-features.php [3] For example https://gerrit.wikimedia.org/r/#/c/287315/ [4] https://phabricator.wikimedia.org/T134857 -Niklas

1 0

JetBrains licenses for Volunteers and Staff
by Yuri Astrakhan 10 May '16

10 May '16

PhpStorm, InteliJ IDEA, Resharper and other JetBrain users, we just received free upgraded licenses for all of their products. Check your account at https://account.jetbrains.com/licenses and login with that account inside your application to automatically use that license. If you don't see the license, contact me or Sam Reed, and we will add you right away. On IRC: yurik or reedy, or via email. We could also use a few more admins for these licenses. You will no longer need to copy/paste any licenses manually. IDEA: use for PHP, JavaScript, Puppets, Ruby, and other web-related technologies https://www.jetbrains.com/idea/ Resharper: for C# and Microsoft Visual Studio C++ https://www.jetbrains.com/resharper/ CLion: for C++ https://www.jetbrains.com/clion/

4 6

← Newer
1
...
6
7
8
9
10
11
12
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2016