Hi,

The report covering Wikimedia engineering activities in January 2014 is now available.

Wiki version: https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2014/January
Blog version: https://blog.wikimedia.org/2014/02/13/engineering-report-january-2014/

We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge:
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2014/January/summary

Below is the HTML text of the report.

As always, feedback is appreciated on the usefulness of the report and its summary, and on how to improve them.

------------------------------------------------------------------

Major news in January include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in January:

Contents

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

Technical Operations

Datacenter RFP

The Wikimedia Operations team is in the final stages of the selection process. A short list of 4 bids has been created and final negotiations are underway. The winner of the bid will be selected in February based on the technical criteria listed in the RFP, and pricing.

Labs metrics in January:

  • Number of projects: 131
  • Number of instances: 441
  • Amount of RAM in use (in MBs): 1,734,144
  • Amount of allocated storage (in GBs): 23,505
  • Number of virtual CPUs in use: 867
  • Number of users: 2,595

Wikimedia Labs

The Labs Migration team, consisting of Andrew Bogott and Marc-Andre Pelletier, have made good progress with testing the newest version of Openstack (called Havana) and with Neutron, an OpenStack project to provide “networking as a service”. The plan is to upgrade the Openstack software when we migrate the Labs infrastructure out of the Tampa data center.

Features Engineering

Editor retention: Editing tools

VisualEditor

In January, the VisualEditor team continued their work on improving the stability and performance of the system, and added some new features. Most of the team’s focus was on major new features and fixing bugs. You can now edit some page settings like whether to display a table of contents or whether to show section edit labels, set the size of a media file manually, see a keyboard shortcuts help screen, and create and edit media galleries using a very basic stand-in editor whilst the final form is being designed. Work also continued on a dialog for quickly adding “citation” references based on templates, more media and page settings, setting content language and right-to-left flags, and equation editing. The deployed version of the code was updated four times (1.23-wmf9, 1.23-wmf10, 1.23-wmf11 and 1.23-wmf12).

Parsoid

In January, the Parsoid team did a lot of bug fixing around images, links, references and various other areas. See the deployment page for a summary.

Part of the team has been mentoring two Outreach Program for Women (OPW) interns. Others are mentoring a group of students in a Facebook Open Academy project to build a Cassandra storage back-end for the Parsoid round-trip test server.

We also participated in the architecture summit, where our RFCs about embracing a service architecture, PHP bindings for services, a general-purpose storage service based on our Rashomon revision store, and a public content API based on this were well received.

Following up on this, we started Debian packaging for Parsoid, which will soon make the installation of Parsoid as easy as apt-get install parsoid.

Core Features

Flow

This month, the Core Features team worked on integrating MediaWiki tools for dealing with spam and vandalism (AbuseFilter and Spam Blacklist) into Flow. We also launched an updated visual design and UI, based on the first round of experienced user feedback last month, as well as ongoing user testing with new users. Lastly, we created a script to disable Flow and return Flow discussions back into unstructured wikitext, so that we can begin trialing Flow in production in an extremely safe-to-fail manner. We are set to deploy our first trial on February 3, 2014 to two WikiProjects that volunteered on the English Wikipedia.

Growth

Growth

In the first month of the year, the Growth team focused on two projects. First, we enhanced and refactored the GettingStarted extension, in part to support local configuration for different Wikipedias. The latest version of GettingStarted and GuidedTour will be released in English and 23 other languages in early February. Second, the team wrapped up several iterations of design and data analysis in support of upcoming work on Wikipedia article creation. We presented new designs for the Draft namespace, and completed a series of remote usability tests (see the results). We also finalized and published extensive quantitative analysis of trends in article creation across the largest Wikipedias. Last but not least, the Growth team welcomed its newest member in January, Software Engineer Sam Smith.

Support

Wikipedia Education Program

This month, once again we divided our time between the existing Education Program extension and work towards a new version of the software. We thoroughly analyzed database transactions in the current extension and fixed a slew of long-standing database-related bugs. Also on the current extension, we finished adding a notification type and notifications infrastructure, and worked on an improved course editing UX. For the new version, we studied workflow systems and considered how software for the Education Program and other outreach activities might use such a system. Adam Wight started on prototype workflow code. He also went through our code review backlog, bringing a multitude of new features and improvements to production.

Mobile

Wikipedia Zero

During the last month, the team added forward compatibility to Varnish scripting for Wikipedia Zero, and resubmitted a Varnish script patch to support HTTPS for select Wikipedia Zero partners under the new IP address-based zero-rating scheme, after analysis with the Operations team. We also continued proof of concept work on an HTML5 web app for Firefox OS, fixed bugs in the legacy Firefox OS Wikipedia app, and prepared alpha functionality for the integration of Wikipedia Zero with the rebooted Android Wikipedia app. The team also continued work toward a generic JSON configuration extension for use by extensions like ZeroRatedMobileAcces, submitted code for the core MediaWiki API, submitted a ResourceLoader (RL) enhancement and cooperated on alternatives for performance enhancement of RL on non-WMF Redis-backed ResourceLoaders, and submitted a small UX enhancement for the Android rebooted Wikipedia app. January 2014 was also a month of planning: the partners engineering team met for two days with the business development team to plan for partners and Wikipedia Zero-related work at large. The partners engineering team also applied itself to two days of product planning for the Partner Portal. Finally, the team conducted normal tech facilitation to enable partner launches and align approaches with current and future partners.

Mobile web projects

We have been directing much of our attention over the last month at delivering a tablet-friendly MobileFrontend experience. We’ve added support for tables of contents in MobileFrontend for tablets, made some design improvements for tablets, and have worked towards making VisualEditor work with MobileFrontend for tablets (in alpha for now). We’ve hit some roadblocks and are hoping to collaborate more with the VE team in the near future to keep moving forward on the project. Following up from last month, we have also released our overlay UI improvements as well as an improved inline diff view for MobileFrontend into stable. Finally, we have also been working to expand our coverage of browser tests to facilitate quality assurance and help prevent the introduction of bugs and regressions.

Language Engineering

Language tools

UniversalLanguageSelector was disabled on January 21 2014 in production for all language Wikimedia sites (other than wikidata.org) due to font delivery performance issues. Users can still enable ULS for their language needs by going to their user profile preferences and enabling ULS from the internationalization settings. Development is in progress for a solution to enable ULS when a user logs in and selects their language preferences explicitly to enable webfonts. David Chan continued his work on language support integration for VisualEditor for phase 5 languages. Niklas Laxstrom and Santhosh Thottingal participated in the architecture summit in San Francisco in January in RFC discussions and JSONification of i18n support for VisualEditor.

Language Engineering Communications and Outreach

The team continued their collaborative projects with Google, Twitter, Microsoft internationalization and MT teams on webfonts, input tools and machine translation.

Content translation

The language engineering team kicked off development of a prototype version of context translation workflow. This functionality aims to create a workspace for helping editors bootstrap new articles in non-Latin language Wikipedias. In the prototype, Russian and Welsh are being used for initial concept verification.

Platform Engineering

MediaWiki Core

Site performance and architecture

The team worked on performance dashboards for VisualEditor and page load time, the ProfilerMwprof profiler class for MediaWiki, and draft performance guidelines.

Admin tools development

This project is on hold, so there have been no significant developments, although some patches have been contributed by volunteer developers.

Search

Auth systems

The team focused on minor updates to close some of the high priority OAuth bugs.

Deployment Tooling

The Logstash service was deployed to production at https://logstash.wikimedia.org/. Work has started on an analysis of the current scap process which will be used to draft requirements for further deployment scripting work in the current quarter.

Security auditing and response

We announced the MediaWiki 1.22.1 and 1.22.2 security releases, and continued to respond to reported vulnerabilities.

Quality assurance

Quality Assurance

January saw the QA team working closely with the Mobile team in particular to enhance the existing suite of test for MobileFrontend. We also participated in the discussion of the Release Engineering deployment process at the architecture summit. Hiring is underway for two open positions, QA Automation Engineer and Test Infrastructure Engineer.

Beta cluster

Beta is being used to test the Math extension rewrite. The Parsoid extension is now deploying continuously via a Jenkins job, status can be found on the CI dashboard job “Parsoid update” bug 57233. The wikis now send updates to the irc.wikimedia.org server bug 60013.

Continuous integration

Zuul has been upgraded and uses a Gearman bus to communicate with Jenkins, the l10n-bot is no more triggering change and we enabled a proper gating system to test changes in parallel. The workflow is smoother and faster to provide feedback in Gerrit. Jenkins slave lanthanum does not offer direct access to internet, we configured the jobs to use a web proxy in MediaWiki (web proxy.eqiad.wmnet or webproxy.pmtpa.wmnet. Finally, the Zuul status page now shows the progress of jobs being run.

Browser testing

In January, we had a number of contributions from the students of Google Code-in, from tests to Jenkins configuration to documentation. We released two entirely new features: one test that monitors the file upload API interface on both production Commons and beta labs Commons, and another test that monitors fatal errors in Beta Labs. We are very close to announcing general availability for two other new features: the ability to run tests headless using Firefox under Xvfb, and the ability to create test data like wiki pages in the target wiki at run time.

Multimedia

Multimedia

In January, the multimedia team focused on developing the Media Viewer and planning our next projects for the year. Gilles Dubuc, Mark Holmquist, Gergo Tisza and volunteer Aaron Arcos implemented a number of improvements to the beta version of the Media Viewer. Some of the features we created or improved include: faster image load, a full-screen mode, better navigation between files, an expanded meta-data panel with location, categories, permissions and assessments. We invite you to test the new UI features on this beta site; faster image load can be tested on this MediaWiki.org page (In both cases, you need to create an account, then click on ‘Beta’ in your personal menu and enable Media Viewer.) Pau Giner also designed a new user interface for displaying slides, video and audio files in the upcoming v0.3 version of Media Viewer, based on team recommendations. Fabrice Florin started a community discussion of our team’s Multimedia Vision for 2016, which proposes a range of improvements to help engage users and support productive collaborations in coming years (more comments welcome). We also planned our work for this quarter’s release, which focuses on Media Viewer through the end of March, and started planning our next big priorities for the rest of the year: UploadWizard and Structured Data on Commons. Lastly, we started a Request for Comments about possible support for the MP4 video standard: we invite you to participate in this discussion, which is due to end in mid-February; we will plan our next steps for video based on community feedback for this RfC. To discuss these projects and keep up with our work, we invite you to join the multimedia mailing list.

Engineering Community Team

Bug management

Valhallasw wrote a script to import tickets from JIRA (used by Toolserver) to Wikimedia Bugzilla, added a “Browse projects” link to Bugzilla’s sidebar, and added an “Upload to Gerrit” button for Bugzilla attachments to the Greasemonkey triagescripts. Inline displaying of image attachments in Bugzilla was re-enabled and the default bug assignee was renamed from “Nobody” to “Nobody – You can work on this!” to be more descriptive. The link to the guided bug entry form at the top of the standard bug entry form now sets the already chosen product directly, saving you two clicks when you switch to the guided form. Work continued on preparing the Bugzilla 4.4 upgrade: Andre Klapper’s patches for porting custom changes from 4.2 to 4.4 were deployed on the Bugzilla test instance on Zirconium and tested, and Daniel Zahn fixed a problem with Bugzilla’s collectstats.pl, so the 4.4 upgrade and server move will take place in February. Bug management documentation related, Andre added a “Situation specific information” section to the Triage guide documentation about purging and profiling.

Project management tools review

Andre Klapper reached out to the teampractices mailing list as well as individual stakeholders, asking users to share their workflow and needs regarding project management and tracking tools. Guillaume Paumier summarized all that content into consolidated requirements; those are now in the process of being compared to features offered by available tools, in order to assemble a shortlist of candidates for community discussion.

Mentorship programs

Google Code-in 2013, the Wikimedia debut.pdf

Wikimedia’s first participation in the Google Code-In program ended up with great success: 273 tasks completed by 46 students with the help of about 30 mentors. Theo Patt and Mateusz Maćkowski were selected winners for Wikimedia, and we sent a special mention to Mayank Madan.

Round 7 of the FOSS Outreach Program for Women started and all projects are on track so far:

Facebook Open Academy‘s warm-up period saw a slow progress in the beginning of the projects. At the end it seemed that everybody was waiting for the official start at the kick-off in Facebook headquarters on February 7−9.

Technical communications

In January, Guillaume Paumier wrapped up work on mentoring Google Code-in students and continued to provide ongoing communications support for the engineering staff. He contributed to writing, simplifying, publishing and distributing the weekly technical newsletter, and published an in-depth article explaining the process by which the newsletter is put together every week.

Volunteer coordination and outreach

We helped organizing the Architecture Summit 2014 in San Francisco (January 23−24) and we got everything ready for FOSDEM in Brussels (February 1−2). We continued working with the tech community metrics around two key performance indicators: who contributes code, and the Gerrit review queue.

Analytics

Kraken

The team has been monitoring the mobile stream and adding additional load to Kafka which has exposed some scaling issues. These have been resolved. In addition, work has been done with the Operations team on designing and implementing a Java deployment system for use with Hadoop and other systems. Finally, work has been initiated to use the data in the warehouse on mobile browser distribution and session length.

Limn

Usability issues continue to be addressed while the team explores options around other visualization frameworks. This month we implemented a feature that simplifies the creation of dashboard by automatically inferring metadata from the data source.

Wikimetrics

We are adding features to Wikimetrics to support scheduled jobs and data access via evergreen URLs. This will support dashboarding and other services that are will be built on top of the service. In addition, we are preparing a Wikimetrics-Vagrant image to help getting started with Wikimetrics development.

Kafka

We’ve increased the throughput on Kafka from 6K/RPS to 50K/RPS to test stability under higher loads.

Data Quality

Review of 2013 traffic trends by the Wikimedia Analytics Team.

The team has spent an intense month analyzing data to explain the page view issues identified in December. The team’s report was shared at the February metrics meeting.

Research and Data

We conducted a thorough review of traffic data and trends and confirmed a downward trend in desktop pageviews in 2013. This trend is not reflected in desktop unique visitors or mobile traffic. We are working on complementing pageviews with other traffic metrics that will help us better monitor readership trends. We engaged with external parties (Google and comScore) to obtain data about referral and mobile traffic respectively.

We completed research on article creation trends on the largest Wikipedias and found substantial differences between different language Wikipedias; specifically, where anonymous editors are allowed to create articles, their success rate (% of articles kept) is substantially higher than that of newly registered editors. We also found that articles that started as Articles for Creation (AfC) and userspace drafts have a near 100% success rate, but the transition that English Wikipedia made toward directing newcomers to start AfC drafts appears to have substantially reduced the amount of successful articles created by newcomers, presumably due to the large review backlog.

We published an update on Visual Editor usage on Wikipedia projects where the editor is enabled by default.

We continued work on metrics standardization for the editor engagement vital signs project and published supportive analysis on definitions and parameter exploration for two proposed standardized user classes: new editor and productive new editor.

We worked with the Analytics Development and Legal teams to articulate use cases and the retention and anonymization strategy for data subject to the retention guidelines, in particular with respect to user agents.

We welcomed Sahar Massachi as a research contractor supporting the team with data analysis for fundraising tests and iterated on new modeling strategies for estimating test success (such as the number of dollars per banner impression). Before he joined us, Sahar worked with the fundraising team, where most recently he focused on writing tools to help the team easily and quickly understand the results of each test.

Offline

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.
Much time this month was spent planning for 2014. We mainly worked on mwoffliner and almost managed to create a full English Wikipedia ZIM file with thumbnails. The upgrade of our main storage platform allowed us to start our automatic ZIM file generation system.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

In January, the team worked mainly on performance improvements around Wikidata. The Quantities datatype was deployed so it is now possible to enter data like the number of inhabitants of a country. Wikisource can now manage its language links via Wikidata as well just like Wikipedia, Wikivoyage and Commons could already. Two new front-end developers, Adrian and Thiemo joined the team to help improve Wikidata’s user interface. Last but not least, the team released their plan for the development of Wikidata in 2014 and beyond.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.



--
Guillaume Paumier
Technical Communications Manager — Wikimedia Foundation
https://donate.wikimedia.org