Wikitech-l November 2015

wikitech-l@lists.wikimedia.org

118 participants
109 discussions

Developer Relations Weekly Summary
by Quim Gil 05 Dec '15

05 Dec '15

7 9

CI and cross repository dependencies
by Antoine Musso 05 Dec '15

05 Dec '15

Hello, We often have the case of a change to an extension depending on a pending patch to MediaWiki core. I have upgraded our CI scheduler - Zuul - a couple weeks ago and it now supports marking dependencies even in different repositories. Why does it matter? To make sure the dependency is fulfilled one usually either: * CR-2 the patch until dependent change is merged * write a test that exercise the required patch in MediaWiki. With the first solution (lack of test), once both are merged, nothing prevent one from cherry picking a patch without its dependent patch. For example for MediaWiki minor releases or Wikimedia deployment branches. When a test covers the dependency, it will fail until the dependent one is merged which is rather annoying. Zuul now recognizes the header 'Depends-On' in git messages, similar to 'Change-Id' and 'Bug'. 'Depends-On' takes as parameter a change-id and multiple ones can be added. When a patch is proposed in Gerrit, Zuul looks for Gerrit changes matching the 'Depends-On' and verify whether any are still open. In such a case, it will craft references for the open patches so all the dependencies can be tested as if they got merged. Real world example ------------------ The ContentTranslation extension is tested with the Wikidata one and was not passing the test. Wikidata created a patch and we did not want to merge it until we confirm the ContentTranslation one is passing properly. The Wikidata patch is https://gerrit.wikimedia.org/r/#/c/252227/ Change-Id: I0312c23628d706deb507b5534b868480945b6163 On ContentTranslation we indicated the dependency: https://gerrit.wikimedia.org/r/#/c/252172/1..2//COMMIT_MSG + Depends-On: I0312c23628d706deb507b5534b868480945b6163 Which is the Wikidata patch. Zuul: * received the patch for ContentTranslation * looked up the change-id and found the Wikidata * created git references in both repo to point to the proper patches Jenkins: * zuul-cloner cloned both repos and fetched the references created by the Zuul service * run tests * SUCCESS That confirmed us the Wikidata patch was actually fixing the issue for ContentTranslation. Hence we CR+2 both and all merged fine. Please take a moment to read upstream documentation: http://docs.openstack.org/infra/zuul/gating.html#cross-repository-dependenc… Wikidata/ContentTranslation task: https://phabricator.wikimedia.org/T118263 -- Antoine "hashar" Musso

7 11

WMF Survey: Please help us understand third-party use of Wikipedia's content
by Sylvia Ventura 04 Dec '15

04 Dec '15

Dear Wikipedia data/API user, The WMF’s Engineering, Product and Partnerships teams are conducting a short survey to help us understand how organizations are pulling and using data from our projects. This information will inform future features and improvements to our data tools and APIs. We would appreciate a few minutes of your time. The link to the Survey below will take you to a Google Form - there is no need to sign up to fill out the survey. The survey should take no more than 10 minutes. https://docs.google.com/forms/d/1yUrHzyLABN419RCDbzepjoRWCbaWYV4wbtbKPa95C4… Thank you for your input and feedback! Warm wishes, Sylvia PS-- Apologies for the cross posting, you might see this note on a couple of other lists. -- *Sylvia Ventura **Strategic Partnerships **Wikimedia Foundation **+1 (415) 839 6885 x6788 *

3 2

Developer Summit planning: code quality sessions
by Daniel Kinzler 03 Dec '15

03 Dec '15

Hi all! If you are interested in code quality and you are attending the Dev Summit, please have a look at <https://phabricator.wikimedia.org/T119032>. I would appreciate any input on which additional sessions you think are important, or how you would prioritize and group them. Any session proposed at <https://phabricator.wikimedia.org/tag/wikimedia-developer-summit-2016/> is fair game, but they should fit into the "code quality" topic somehow (other proposals will be discussed elsewhere). Context: The ArchCom is currently in the process of figuring sorting through the list of proposed sessions, and trying to prioritize and group them. To this end, we have identified 5 broad topic areas ("Content Format", "Access and APIs", "Collaboration", "Software Engineering", and "User Interface" - see T119018 for an overview). My job is now to figure out which sessions we want in the "Software Engineering" (aka "code quality") part of the event. I have started to do this at <https://phabricator.wikimedia.org/T119032>. If you have any thoughts on how these sessions should be prioritized or grouped, or what is missing, please comment. Thanks, Daniel

2 1

Peer-to-peer sharing of the content of Wikipedia through WebRTC
by Yeongjin Jang 01 Dec '15

01 Dec '15

Hi, I am Yeongjin Jang, a Ph.D. Student at Georgia Tech. In our lab (SSLab, https://sslab.gtisc.gatech.edu/), we are working on a project called B2BWiki, which enables users to share the contents of Wikipedia through WebRTC (peer-to-peer sharing). Website is at here: http://b2bwiki.cc.gatech.edu/ The project aims to help Wikipedia by donating computing resources from the community; users can donate their traffic (by P2P communication) and storage (indexedDB) to reduce the load of Wikipedia servers. For larger organizations, e.g. schools or companies that have many local users, they can donate a mirror server similar to GNU FTP servers, which can bootstrap peer sharing. Potential benefits that we think of are following. 1) Users can easily donate their resources to the community. Just visit the website. 2) Users can get performance benefit if a page is loaded from multiple local peers / local mirror (page load time got faster!). 3) Wikipedia can reduce its server workload, network traffic, etc. 4) Local network operators can reduce network traffic transit (e.g. cost that is caused by delivering the traffic to the outside). While we are working on enhancing the implementation, we would like to ask the opinions from actual developers of Wikipedia. For example, we want to know whether our direction is correct or not (will it actually reduce the load?), or if there are some other concerns that we missed, that can potentially prevent this system from working as intended. We really want to do somewhat meaningful work that actually helps run Wikipedia! Please feel free to give as any suggestions, comments, etc. If you want to express your opinion privately, please contact sslab(a)cc.gatech.edu. Thanks, --- Appendix --- I added some detailed information about B2BWiki in the following. # Accessing data When accessing a page on B2BWiki, the browser will query peers first. 1) If there exist peers that hold the contents, peer to peer download happens. 2) otherwise, if there is no peer, client will download the content from the mirror server. 3) If mirror server does not have the content, it downloads from Wikipedia server (1 access per first download, and update). # Peer lookup To enable content lookup for peers, we manage a lookup server that holds a page_name-to-peer map. A client (a user's browser) can query the list of peers that currently hold the content, and select the peer by its freshness (has hash/timestamp of the content, has top 2 octet of IP address (figuring out whether it is local peer or not), etc. # Update, and integrity check Mirror server updates its content per each day (can be configured to update per each hour, etc). Update check is done by using If-Modified-Since header from Wikipedia server. On retrieving the content from Wikipedia, the mirror server stamps a timestamp and sha1 checksum, to ensure the freshness of data and its integrity. When clients lookup and download the content from the peers, client will compare the sha1 checksum of data with the checksum from lookup server. In this settings, users can get older data (they can configure how to tolerate the freshness of data, e.g. 1day older, 3day, 1 week older, etc.), and the integrity is guaranteed by mirror/lookup server. More detailed information can be obtained from the following website. http://goo.gl/pSNrjR (URL redirects to SSLab@gatech website) Please feel free to give as any suggestions, comments, etc. Thanks, -- Yeongjin Jang

10 14

Your must-have WikiDev'16 sessions
by Quim Gil 01 Dec '15

01 Dec '15

Hi, here is a simple game that we ask you to play by the end of next Monday, December 7. Please help us define the must-have sessions at the Wikimedia Developer Summit. Go to https://phabricator.wikimedia.org/tag/wikimedia-developer-summit-2016/ and select your must-have sessions, specially in the areas that you are directly involved (scroll to the right to see all columns!!). Then, go to https://phabricator.wikimedia.org/T119593 and post your recommendations in a comment. Just like with code review and +1, please don't recommend your own sessions. The Architecture Committee, WMF Engineering management, and the Summit organizers will send their recommendations as +2. We have 21 slots of 80 minutes for pre-scheduled sessions. If the total of must-have candidates is 21 or less, then great. If it is more, we will have to make hard calls. Or not so hard, because the rest of the sessions will still have time and space, only in unconference mode. https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016#Program -- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil

1 0

Compression with API
by Petr Bena 30 Nov '15

30 Nov '15

Hi, I created this ticket: https://phabricator.wikimedia.org/T119878 The basic idea is that it shouldn't be a big problem to compress output of api.php script using some widely available library, like gzip. That way the size of communication between client and server would be much smaller and users with slow internet might benefit from this. I am not sure how much the data would be reduced, but it could be a significant number in some cases. What do you think about it? Is there any reason not to do that? Note I don't propose some breaking change, rather just create an optional parameter "compression" that would be passed for API requests.

4 5

Readership metrics for the week until November 22, 2015
by Tilman Bayer 30 Nov '15

30 Nov '15

Hi all, here is the weekly look at our most important readership metrics (apologies for the delay). Apart from the usual data, this time there is an additional chart to illuminate how our mobile readership ratio has developed since this spring, the iOS app retention stats are back after Apple fixed their data, and we conclude with some inspiring quotes about climate change awareness ;) As laid out earlier <https://lists.wikimedia.org/pipermail/mobile-l/2015-September/009773.html>, the main purpose of this report is to raise awareness about how these are developing, call out the impact of any unusual events in the preceding week, and facilitate thinking about core metrics in general. We are still iterating on the presentation and eventually want to create dashboards for those which are not already available in that form already. Feedback and discussion welcome. Now to the usual data. (All numbers below are averages for November 16-22, 2015 unless otherwise noted.) Pageviews Total: 540 million/day (-0.0% from the previous week) Context (April 2015-November 2015): ( see also the Vital Signs dashboard <https://vital-signs.wmflabs.org/#projects=all/metrics=Pageviews>) The Analytics team improved web crawler detection further last week <https://meta.wikimedia.org/w/index.php?title=Dashiki%3APageviewsAnnotations…>, meaning an “optical” (as opposed to real) drop in human pageviews from November 19 on - presumably smaller though than the one for September that we reported in the preceding report. Desktop: 57.2% (previous week: 57.5%) Mobile web: 41.6% (previous week: 41.3%) Apps: 1.2% (previous week: 1.2%) Context (April 2015-November 2015): These percentages usually don’t change rapidly from week to week. For a wider perspective, I’m including a chart of the (aggregate) mobile percentage this time, too. Technically this information is already contained in the usual chart above. But here we can see even clearer indications for an impact of the HTTPS-only switchover during June (it appears to have taken out desktop traffic mainly), as well as the strong weekly periodicity (higher mobile ratio on weekends). It looks like mobile won’t overtake desktop anytime soon. Global North ratio: 77.3% of total pageviews (previous week: 77.6%) Context (April 2015-November 2015): New app installations Android: 30.9k/day (-44.2% from the previous week) Daily installs per device, from Google Play Context (last month): As described in the previous report, the Android Wikipedia app was featured in the "New and Updated Apps" section of the Google Play store from November 5-12, and while the huge positive impact overall on download numbers is obvious, they also decreased markedly afterwards. They seem to be coming back up a bit now, but we are still waiting for some more data before making a final estimate for the overall effect, and have also contacted Google to see if they can help us illuminate the mechanism behind this apparent effect. iOS: 4.69k/day (+2.2% from the previous week) Download numbers from App Annie Context (last three months): No news here. App user retention Android: 14.8% (previous week: 15.2%) (Ratio of app installs opened again 7 days after installation, among all installed during the previous week. 1:100 sample) Context (last three months): iOS: 12.0% (previous week: 11.9%) (Ratio of app installs opened again 7 days after installation, among all installed during the previous week. From iTunes Connect, opt-in only = ca. 20-30% of all users) Context (installation dates from October 18-November 15, 2015): This metric was left out of last week’s report because of inconsistencies. Indeed, Apple has since issued a correction notice <http://www.talkingnewmedia.com/2015/11/24/apple-issues-corrected-itunes-con…>. Unfortunately it looks like the data underlying the report for the week until November 8 was affected too, so please disregard the iOS retention figure given in that report. Unique app users Android: 1.190 million / day (-2.2% from the previous week) Context (last three months): This too will need another look. iOS: 281k / day (+0.1% from the previous week) Context (last three months): No news here. After publishing this report regularly for a bit over two months, we may be rethinking the weekly publication schedule a little - also to keep the balance between newsworthiness and keeping up general awareness for longterm developments. In that vein, some inspiring quotes about a weekly climate change newsletter <http://www.niemanlab.org/2015/11/climate-change-is-depressing-and-horrible-…> that begins every issue by reciting the current CO2 ratio in the atmosphere as a KPI ;) Ultimately, Meyer said, the newsletter comes out of the idea that “if you’re worried about something, you should pay regular attention to it.” “By paying attention to it over time, and watching its texture change over time, you will come to have ideas about it,” he said. “You will come to understand it in a new way, and you will contribute in a very small way to how society addresses this big problem.” [...] So it seemed as if a newsletter might be a good way to cover the issue. [...] “You can get a continuity of storyline,” Meyer said. “You can’t cover all of everything that’s happening every week in the climate, but you can watch certain parts develop, and hopefully bring people in over time.” He leads off the “Macro Trends” section of each issue with the molecules per million of carbon dioxide in the atmosphere: The atmosphere is filling with greenhouse gases. The Mauna Loa Observatory measured an average of 398.51 CO2 molecules per million in the atmosphere this week. A year ago, it measured 395.84 ppm. Ten years ago, it measured 376.93 ppm. “What we’re doing now won’t show up in that number for a decade or so,” he said. “But by reminding myself of it every week, and thinking about its contours and its direction, that’s a way to stay focused on what matters.” ---- For reference, the queries and source links used are listed below (access is needed for each). Most of the above charts are available on Commons, too <https://commons.wikimedia.org/w/index.php?title=Special:ListFiles&offset=20…> . hive (wmf)> SELECT SUM(view_count)/7000000 AS avg_daily_views_millions FROM wmf.projectview_hourly WHERE agent_type = 'user' AND CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) BETWEEN "2015-11-16" AND "2015-11-22"; hive (wmf)> SELECT year, month, day, CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) as date, sum(IF(access_method <> 'desktop', view_count, null)) AS mobileviews, SUM(view_count) AS allviews FROM wmf.projectview_hourly WHERE year=2015 AND agent_type = 'user' GROUP BY year, month, day ORDER BY year, month, day LIMIT 1000; hive (wmf)> SELECT access_method, SUM(view_count)/7 FROM wmf.projectview_hourly WHERE agent_type = 'user' AND CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) BETWEEN "2015-11-16" AND "2015-11-22" GROUP BY access_method; hive (wmf)> SELECT SUM(IF (FIND_IN_SET(country_code, 'AD,AL,AT,AX,BA,BE,BG,CH,CY,CZ,DE,DK,EE,ES,FI,FO,FR,FX,GB,GG,GI,GL,GR,HR,HU,IE,IL,IM,IS,IT,JE,LI,LU,LV,MC,MD,ME,MK,MT,NL,NO,PL,PT,RO,RS,RU,SE,SI,SJ,SK,SM,TR,VA,AU,CA,HK,MO,NZ,JP,SG,KR,TW,US') > 0, view_count, 0))/SUM(view_count) FROM wmf.projectview_hourly WHERE agent_type = 'user' AND CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) BETWEEN "2015-11-16" AND "2015-11-22"; hive (wmf)> SELECT year, month, day, CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")), SUM(view_count) AS all, SUM(IF (FIND_IN_SET(country_code, 'AD,AL,AT,AX,BA,BE,BG,CH,CY,CZ,DE,DK,EE,ES,FI,FO,FR,FX,GB,GG,GI,GL,GR,HR,HU,IE,IL,IM,IS,IT,JE,LI,LU,LV,MC,MD,ME,MK,MT,NL,NO,PL,PT,RO,RS,RU,SE,SI,SJ,SK,SM,TR,VA,AU,CA,HK,MO,NZ,JP,SG,KR,TW,US') > 0, view_count, 0)) AS Global_North_views FROM wmf.projectview_hourly WHERE year = 2015 AND agent_type='user' GROUP BY year, month, day ORDER BY year, month, day LIMIT 1000; https://console.developers.google.com/storage/browser/pubsite_prod_rev_0281… (“overview”) https://www.appannie.com/dashboard/252257/item/324715238/downloads/?breakdo… (select “Total”) SELECT LEFT(timestamp, 8) AS date, SUM(IF(event_appInstallAgeDays = 0, 1, 0)) AS day0_active, SUM(IF(event_appInstallAgeDays = 7, 1, 0)) AS day7_active FROM log.MobileWikiAppDailyStats_12637385 WHERE timestamp LIKE '201511%' AND userAgent LIKE '%-r-%' AND userAgent NOT LIKE '%Googlebot%' GROUP BY date ORDER BY DATE; (with the retention rate calculated as day7_active divided by day0_active from seven days earlier, of course) https://analytics.itunes.apple.com/#/retention?app=324715238 hive (wmf)> SELECT SUM(IF(platform = 'Android',unique_count,0))/7 AS avg_Android_DAU_last_week, SUM(IF(platform = 'iOS',unique_count,0))/7 AS avg_iOS_DAU_last_week FROM wmf.mobile_apps_uniques_daily WHERE CONCAT(year,LPAD(month,2,"0"),LPAD(day,2,"0")) BETWEEN 20151116 AND 20151122; hive (wmf)> SELECT CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) as date, unique_count AS Android_DAU FROM wmf.mobile_apps_uniques_daily WHERE platform = 'Android'; hive (wmf)> SELECT CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) as date, unique_count AS iOS_DAU FROM wmf.mobile_apps_uniques_daily WHERE platform = 'iOS'; -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB

1 0

RFC meeting: Minimum PHP version
by Rob Lanphier 29 Nov '15

29 Nov '15

Hi folks, This week's RFC review meeting is scheduled for Wednesday, November 25 at 2pm PST (22:00 UTC). Event particulars can be found at <https://phabricator.wikimedia.org/E92> The main task this week is to plan out what we will define the minimum PHP version to be for MediaWiki 1.27 (the next LTS version). The viable choices seem to be: * PHP 5.3 (the status quo) - this version is no longer supported upstream, and doesn't have widespread support even in conservatively updated Linux distros. * PHP 5.4 - this version is no longer supported by The PHP Group, but is still part of older supported Linux distros (e.g. Debian Wheezy) * PHP 5.5 - this is the lowest version with reliable LTS support in major Linux distros The RFC additionally stipulates some coding standards, since even though it upgrading our version of PHP would make use of some features possible, that doesn't automatically make their use a good idea. The author broke up the feature set into "encouraged", "tolerated" and "verboten". Please read the RFC directly for more info on this: <https://phabricator.wikimedia.org/T118932> Please comment on T118932 if you have further thoughts to share and/or please attend the meeting on Wednesday. Thanks Rob

2 2

Flushing cached parser output after preview
by Yuri Astrakhan 29 Nov '15

29 Nov '15

https://phabricator.wikimedia.org/T119779 Graph extension generates different html output depending on the isPreview parser option, but if user previews a page and saves it right after without any changes, the parser reuses previous output. Is there a way to force parser regenerate on save? Thanks!

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2015