Hi,
The report covering Wikimedia engineering activities in July 2013 is now available.
Wiki version: https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/July
Blog version: https://blog.wikimedia.org/2013/08/05/engineering-july-2013-report/
We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge:
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/July/summary
Below is the full HTML text of the report.
As always, feedback is appreciated on the usefulness of the report and its summary, and on how to improve them.
------------------------------------------------------------------
Major news in July include:
- Giving more editors an easy-to-use editing interface (the VisualEditor) on several Wikipedias
- Improving language support on our sites via summer interns' projects and easier configuration options, and asking for help translating the VisualEditor interface
- Enabling users to edit our sites from mobile devices, like phones and tablets, and announcing a future user experience bootcamp focusing on mobile editing
- Finishing our transition from keeping source code in Subversion to storing it in Git
- Launching a Wikipedia Zero partnership with Aircel, giving mobile subscribers in India the potential to access Wikipedia at no data cost
- Updating the Wikimedia movement on how we intend to protect our users' privacy with HTTPS
- Signing a contract with longtime MediaWiki contributors to manage MediaWiki releases for the open source community
- Explaining how we find and gather software problems and deliver the fixes to users
Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Personnel
Are you looking to work for Wikimedia? We have a lot of hiring coming
up, and we really love talking to active community members about these
roles.
Announcements
- Bryan Davis joined the Platform Engineering team as a Senior
Software Engineer, working generally on backend software issues and
starting off supporting multimedia (announcement).
- C. Scott Ananian joined the Parsoid team as a Senior Features Engineer (announcement).
- Kenan Wang joined the Product team as Product Manager for Mobile (announcement).
Technical Operations
Site infrastructure
- Lots of Puppet
refactoring work got done this month, including considerable
reorganization of the puppet masters. Several manifests have been moved
into modules, but completing this project will take many months.
Data Dumps
- The English Wikipedia dumps ran out of our Ashburn data center this
month, and so did a number of other big wikis' dumps. There's an issue
with the abstract dumps that needs to be sorted out for those, but other
than that everything ran smoothly.
- Petr Onderka has been getting a lot of work done on the incremental dumps. A first preview of the code was announced
as well as a proposed binary file format which the program currently
uses. For a preview of what's coming up, you can check the timeline. Your comments and suggestions are welcome!
Wikimedia Labs
- Though there were some features introduced this month, the majority
of our time was spent on documentation, tracking down bugs and improving
usability. We had a documentation sprint this month, targeted at
improving documentation for the Tool Labs
project. Work continued on stabilizing the NFS server -- we believe
we've tracked down the stability issues to RAID controller problems. The
compute nodes are becoming increasingly low on disk space, but we've
tracked this down in a change in behavior of nova and have deployed a
fix. nova-network was starting to experience timeouts due to excessive
load, leading to instance creation failures. We've extended dhcp renewal
times to reduce load. We upgraded wikitech.wikimedia.org and
wiktech-static to the 1.22wmf11 version of MediaWiki. We also: deployed
the AJAX-enabled delete instance feature; deployed a change to display
more informative instance statuses; fixed issues in LdapAuthentication
that broke blocking and renaming users; and deployed a change to allow
service groups to be added to service groups, to make sharing code and
data between tools easier.
Editor retention: Editing tools
VisualEditor
In
July, the VisualEditor team began switching the deployment from opt-in
alpha to opt-out beta, so becoming the default editor for users of the
various Wikipedias. The deployed version of the code was updated three
times (
1.22-wmf10,
1.22-wmf11 and
1.22-wmf12),
with several mid-deployment releases as the code was developed to patch
urgent issues. There were a number of user interface improvements, most
notably to the references insertion dialog, alongside fixes to a number
of bugs uncovered by the community.
Parsoid
In
July, the Parsoid team supported the deployment of VisualEditor as
default editor on eight Wikipedias, continuing to monitor bug reports,
feedback pages, and village pump and fixed a number of bugs to eliminate
instances of dirty diffs and other corruption that were reported. An
absence of performance issues let us focus our attention on
functionality and dirty-diff related bugs. This continued to be the
primary focus of our work this month. On the staffing side, C. Scott
Ananian joined the Parsoid team as a full-time employee -- he has been
working with us since earlier this year, first as a volunteer and then
as a contractor. Marc Ordinas i Llopis from Spain and Arlo Breault from
Canada joined the Parsoid team as contractors this month.
Editor engagement features
Flow
Notifications
Article feedback
In July, we deployed a few last features and bug fixes for the
Article Feedback Tool (AFT5) on the
English and
French Wikipedias. Matthias Mullie released the
auto-archive feature, as well as this
list of articles with feedback enabled on enwiki and on
frwiki. At the request of the French Wikipedia community, he also developed new
feedback notifications
to let users know when feedback is marked as useful for a page they
watch (or for a comment they posted). The team plans to make the AFT5
tool available to other wiki projects interested in testing this tool,
provided that no new development is required to support their needs, as
outlined
in the release plan.
Editor engagement experiments
|
We're hiring! Are you a front-end developer? Do you know someone who is? Apply today. |
Editor engagement experiments
In
July, the Editor Engagement Experiments (E3) team made progress on a
number of continuing projects. In terms of features, the team also
completed work to integrate the
onboarding new Wikipedians project with new infrastructural changes and feature releases.
For the GettingStarted, E3 collaborated with Platform engineering to ensure compatibility with the new "SUL2" cross-wiki authentication architecture. For the GuidedTour extension, the team completed a first release of support for guided tours of the VisualEditor interface, alongside tours of the legacy wikitext editor, and developed a plan to refactor
the GuidedTour extension as well as its API. E3 also planned for its
sixth A/B test of the GettingStarted workflow (see proposed specification and mockups).
As an addition to the team's redesign of account creation and login
(launched in May-June), we enhanced the design of the form for users who
fulfill account creation requests for others.
E3 team member Matthew Flaschen also worked with two Google Summer of Code students on their projects. Richa Jain is working on the Annotator extension, which allows adding inline comments to a wiki page. Rahul Maliakkal is working on the Pronunciation Recording extension, for adding audio of pronunciations to Wiktionary.
On the experimental tools and data analysis front, E3 completed a significant rewrite of the
Puppet configuration for
EventLogging, our data collection pipeline, among other changes. For the
MediaWiki-Vagrant
portable desktop development environment, E3 added support for flexibly
provisioning and unit testing extensions such as GettingStarted,
GuidedTour,
ParserFunctions, EventLogging, and others. Last but not least, the
micro-survey
of gender of new account registrations was enabled on German, French,
Italian, and Polish Wikipedias, while data analysis on the English
Wikipedia results began.
Support
2013 Wikimedia fundraiser
In
July, the fundraising team did its first successful tests of our new
payments gateway: Adyen. The (as yet) US-only Credit Card backup gateway
performed similarly to our primary credit card processor in A/B
testing, and can be successfully used as a failover. We also ran, for
the first time, several short campaign tests targeted at mobile devices
in the US. In these tests, users were able to choose between Paypal or
Amazon Payments. Additional tests to determine peak times, appropriate
localities, and optimum messaging for mobile campaigns will continue
throughout August, as the campaigns are prepared.
Wikipedia Zero
This month, the team
launched Wikipedia Zero with Aircel in India,
a carrier with about 60 million cellphone subscribers. We also
completed our first cut of automation testing, started the
implementation of the Wikipedia Zero software re-architecture, and
patched bugs. During July we planned for the upcoming year in Wikipedia
Zero. On the engineering front we are focusing first on test automation
and re-architecture concurrent with SMS/USSD and J2ME releases, and
afterward will be focusing efforts on end user UX and carrier-oriented
enhancements that will support the continued growth of the program.
Mobile web projects
This month, the mobile web team
released a new contributory nav
to all Wikimedia mobile sites, including the existing upload and
watchlist star features, as well as an edit button. This means that
editing (in the form of section-level markup editing) is now enabled on
all mobile Wikimedia sites for logged in users. In beta, we began work
on mobile notifications restyling, as well as guiders for first-time
editors and uploaders.
MediaWiki Core
MediaWiki 1.22
Git conversion
Multimedia
Admin tools development
This activity was on hiatus in August.
Search
Nik
Everett and Chad Horohoe have continued writing an extension to
implement ElasticSearch searching for MediaWiki, and we've finished most
of the required features. Next comes getting it deployed, scaled, and
fixing the inevitable bugs. We're aiming to deploy to the test site
beta.wmflabs.org before the end of the month. Peter Youngmeister and
Asher Feldman will be handling the operations tasks for the new setup.
Auth systems
HipHop deployment
Security auditing and response
The team continued to respond to reported security issues, and addressing outstanding bugs.
Quality assurance
Quality Assurance
Beta cluster
The
Beta cluster continues to be a target for automated and manual testing.
It also finally has a syslog receiver on deployment-bastion, thus
solving
bug 36748
(no syslog::server in beta). The logs can be accessed via either
/home/wikipedia/syslog or /data/project/logs/syslog/ . This is thanks to
Leslie Carr.
Browser testing
In
July we added coverage for a number of features, including
VisualEditor, UniversalLanguageSelector, and Mobile Search. We are
making extensive use of beta labs as well as the test2wiki test
environment. Our automated browser tests continue to identify important
issues during feature development.
Analytics
We reviewed our planning document with the Sue and Erik and the
Engineering Directors. Reception was positive and we will be
communicating next steps more widely in August. The Analytics team
focused on short term deliverables, reliability and hiring in July. We
identified two potential candidates for front-end/Python work. We have
been performing multiple phone screens together with Recruiting, and the
hiring pipelines are good.
Analytics infrastructure
Kraken:
- We kicked off a reliability project with Ops with the end goal of
stabilizing Hadoop and the logging infrastructure. Teams have been in
discussions on architecture and planning, and should have a path forward
in the next 2 weeks. We identified a consultant who will perform a
system audit to aid the project.
- We continue adding new metrics
and alerts to monitor all the different parts of the webrequest
dataflows into Kraken. We expect to keep making improvements in the
coming months until we have a fully reliable data pipeline into Kraken.
Logging Infrastructure:
- We started this month with designing a canary event monitoring system.
A canary event is an artificial event that is injected at the start of
the data workflow and which we will monitor to see it reaches its final
destination; that way we can ensure that the dataflows are functioning.
- We are investigating what data format to use for sending the
webrequest messages from Varnish to the Hadoop cluster. Formats that we
are scrutinizing are JSON, Protobuf and AVRO, but we are also looking at
compressions algorithms such as Snappy.
Analytics Visualization, Reporting & Applications
Wikimetrics: We successfully launched the initial version of Wikimetrics: see metrics.wmflabs.org.
This version has support for cohort upload and two metrics: 1) bytes
added and 2) namespace edits. We are working on adding support for
time-series and aggregators. In the coming sprints we will focus on
adding new metrics.
Wikipedia Zero: Dashboards have been moved off of Hadoop for
the time being and are now being populated again. We have identified
some issues with logrotation that are causing gaps in the graphs, and
will look into these problems. Also, we have been working on technical
handoff as Evan Rosen leaves the Foundation.
Limn: No development news.
Wikistats: No development news.
Data Releases
- Erik Zachte published data and longitudinal analyses of edit and revert trends for Wikimedia projects (read the announcement). We provided data and ad-hoc analysis for the presentation A State of Decline? The State of Wikimedia Communities as of July 2013 at the July 2013 Monthly Metrics Meeting.
- We published the analysis of a controlled experiment that we ran in June to test the Impact of notifications on new contributors and a pre-release A/B test of Visual Editor
on the English Wikipedia. We performed an extensive audit of the
quality of the data collected during and after the VE test, taking into
account browser limitations and known bugs, and posted an update on the state of the analysis. We released via our open data repository the complete dataset of the sample of new registered users who participated in the split test to ensure the replicability of the analysis.
- We released real-time dashboards on edit activity, new account
registrations and reverts for the 10 Wikipedias on which VE has been
rolled out. (en • de • es • fr • he • it • nl • pl • ru • sv)
Bug management
Mentorship programs
Quim Gil organized meetings with each
Google Summer of Code and
Outreach Program for Women
team, one by one. Most projects were already at full speed, and for
them, the meeting was primarily social and nice to have. A few really
benefited from going through a checklist to highlight early problems
easy to solve now. All GSoC and OPW projects, 21 in total, are now on
track.
Technical communications
Like in June,
Guillaume Paumier
was seconded to the VisualEditor deployment effort, working on
communications, documentation and liaising with the French Wikipedia.
Work on technical communications mostly focused on perennial activities
like ongoing communications support to the engineering staff.
Volunteer coordination and outreach
On
Community metrics,
Quim Gil focused on the consolidation of
korma.wmflabs.org,
the new dashboard for automated community metrics. We have made good
progress on this alpha, including basic metrics from Git, Bugzilla and
mailing lists being retrieved on a daily basis, and have filed bugs and
enhancement requests on GitHub (
mediawiki-dashboard,
VizGrimoireJS).
We are deciding on the key metrics we need in order to make decisions,
e.g. average time to resolve on Gerrit changesets or bug reports. We
also planned and promoted a
Browser Testing Automation workshop with Cucumber together with the QA team, with 13 people participating online. You can watch the session
here (1h40). The experience was useful, as we agreed on
MediaWiki-Vagrant as the default environment for automated testing and highlighted the
list of easy bugs. Also, the Engineering Community team held its
quarterly review.
- The language team deployed Universal Language Selector (ULS) to most Wikimedia wikis to provide easier configuration options
to readers and contributors. ULS provides a flexible way to configure
and deliver language settings like interface language, fonts, and input
methods (keyboard mappings). Also, ULS allows users to type text in
different languages not directly supported by their keyboard, read
content in a script for which fonts are not available locally, or
customise the language in which menus are displayed. For more
information, please see the FAQ.
- The Language engineering team also mentored summer interns' projects to improve language support on our sites, and asked for volunteer help translating the VisualEditor interface.
The Kiwix project is funded and executed by Wikimedia CH.
- We are preparing the first release of a new Wikipedia ZIM creation solution for August. We also have achieved a new release of Kiwix for Android;
this new version includes a few bug fixes and new features. Beside the
release of traditional Wikipedia ZIM files, we have also published two
interesting ZIM files: one which includes 2,500 ebooks (EPUB & PDF) of French literature and one with the new Wikipedia for Schools selection. The ZIM incremental update GSoC project progresses well too: first working versions of zimpatch & zimdiff console tools are available, and integration with Kiwix has started. Kiwix developers will be available at Wikimania, during the hacking days and at the WikimediaCH both during Wikimania itself.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- In July, we deployed Wikidata to all Wikivoyage sites in all
languages, to manage their language links. We updated the continued Roadmap for Wikidata Development. Coveralls.io support
has been added to most of our components. Since the first deployment of
Phase1 to Wikipedia, about 240 million interwikilinks (5GB text) have
been removed from articles (2012 vs 2013 analysis).
- In other news, the AAAI Feigenbaum Prize for Watson was donated to the Wikimedia Foundation by IBM research to support work, especially on Wikidata.
- Denny Vrandečić explains why Wikidata items are identified with a Q.
Future
- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. Annual goals for the 2013–2014 fiscal year are being drafted by some teams and have been finalized by others.
This article was written collaboratively by Wikimedia engineers and managers, and assembled by Sumana Harihareswara. See revision history and associated status pages. A wiki version is also available.
--
Guillaume Paumier
Technical Communications Manager — Wikimedia Foundation
https://donate.wikimedia.org