The report covering Wikimedia engineering activities in December 2012 is now available.
Wiki version: https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/December
Blog version: https://blog.wikimedia.org/2013/01/10/engineering-december-2012-report/
We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge:
Below is the full HTML text of the report, as previously requested.
As always, feedback is appreciated about the usefulness of the report and its summary, and on how to improve them.
Major news in December include:
Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Are you looking to work for Wikimedia? We have a lot of hiring coming
up, and we really love talking to active community members about these
Production Site Switchover
- The Technical Operations team continued to work on completing the
outstanding migration tasks, and to ready our Ashburn infrastructure for
the big switchover day, i.e., the complete transition from the Tampa
datacenter to the one in Ashburn, on the week of January 22, 2013.
- In the past few months, we've transitioned services from the Tampa
datacenter to the one in Ashburn, which now serves most of our traffic
(about 90%). However, application (MediaWiki), memcached and database
systems are all still running exclusively out of Tampa. We have been
working to upgrade the technologies and set up those systems at Ashburn,
and we plan to perform the switchover of those services from Tampa to
Ashburn in the coming weeks. This will provide us some assurance of a
hot standby datacenter, should we encounter an irrecoverable and lengthy
outage in one of the main datacenters.
- Because December is when the annual Wikimedia fundraiser happens,
the Operations team usually makes fewer site infrastructure changes to
mitigate the risks of causing outages. Some of the lesser-risk work
performed include deploying the new Parsoid cluster to support the
Visual Editor project, rolling out doc.wikimedia.org
(our auto-generated puppet documentation), using a new and unified SSL
certificate for *wikipedia.org and *.m.wikipedia.org sites, and setting
up a monitoring server and service in Ashburn.
- Asher Feldman migrated one of the main production slave database
server (db59) for the English Wikipedia (enwiki) to MariaDB 5.5.28. He
has been testing 5.5.27 on the primary research slave, and on the
current build on a slave in Ashburn. Taking the times of 100% of all
queries over regular sample windows, the average query time across all
enwiki slave queries is about 8% faster with MariaDB compared to our
production build of MySQL 5.1-fb. Some queries types are 10–15% faster,
some are 3% slower, and nothing looks aberrant beyond those bounds.
Overall throughput as measured by qps has generally been improved by
2–10%. Asher wouldn't draw any conclusions from this data yet: more
testing is needed to filter out noise, but initial results are positive.
The main reason for migrating to MariaDB is not performance, but rather
by the belief that it's in the Wikimedia Foundation's and the
open-source communities' interest to coalesce around the MariaDB
Foundation as the best route to ensuring a truly open and well-supported
future for MySQL-derived database technology.
- Mark Bergsma and Faidon Liambotis have made tremendous progress in
testing and deploying Ceph in Ashburn. We are hopeful it will be robust
- Ryan Lane has been writing a new deployment system using git and
Saltstack. Parsoid is currently being deployed with this system, and
MediaWiki is slated to use it for its next major deployment.
- There were no major changes on the fundraising infrastructure
because of the fundraiser itself. We ordered and received bastion hosts
that we're in the process of deploying. Monitoring got an overhaul and
we're now sending alerts to the fundraising technical staff or the
technical operations team depending on what triggered the alert.
- A tool for dump users to set up interwiki links on their local mirrors is available in alpha,
as well as documentation of the interwiki cdb file. Also, work with
WanSecurity on mirroring is moving forward: they now hold a current copy
of all 'other' files, including page views and Picture of the Year
bundles, among other things.
- Labs came out of beta this month, following the opening of
self-registration. Another major change this month was the migration
from the shared NFS instance to per-project glusterfs volumes. A number
of smaller changes were made, including: the Addition of puppet
documentation links from classes and variables on the instance
configuration pages; the modification of the project filter to act as a
table of contents; a split of LDAP project groups into projects and
POSIX groups; and the installation of Saltstack on all instances to act
as a guest agent.
Editor retention: Editing tools
In December, the team deployed to the English Wikipedia
an alpha version of the VisualEditor for editors to use and give
feedback on issues and priorities. The team's work focussed on ensuring
that the integration was reliable, and providing a dedicated tool for
editors to report problems with editing, and, after deployment,
addressing the reports and ideas from editors. The early version of the
VisualEditor on mediawiki.org
was also updated to use the new
developments (as part of 1.21-wmf6
project reached a major milestone with its first deployment to the
English Wikipedia along with the VisualEditor. This was a major test for
Parsoid, as it needed to handle the full range of arbitrary and complex
existing wiki content including templates, tables and extensions for
the first time.
As witnessed by the clean edit diffs,
Parsoid passed this test with flying colors. This represents very hard
work by the team (Gabriel Wicke, Subramanya Sastry and Mark Holmquist)
on automated round-trip testing and the completion of a selective serialization strategy just in time for the release.
After catching their breath, the team now has its sights on the next
phase in Parsoid development. This includes a longer-term strategy for
the integration of Parsoid and HTML DOM into MediaWiki, performance
improvements and better support for complex features of wikitext.
Editor engagement features
month, the team continued to develop key features of the Notifications
project (code-named 'Echo'), and deployed a first experimental release
. Fabrice Florin expanded feature requirements
for this release, and Vibha Bamba designed more components of the user experience
Ryan Kaldari and Benny Situ developed improved notification flyouts and
email digests, as well as new notifications such as page links. Luke
Welling built an HTML email module, which will soon be available to
other projects as well. We plan to develop more features this month and
deploy them for new editors on the English Wikipedia in early 2013.
Please help us test
these new features to provide feedback and find bugs. We're also looking to hire a software engineer
as part of this project.
We made good progress
on Article Feedback version 5
this month. We completed a research study
on the English Wikipedia, confirming that many readers use this feature
and a sizable number of them go on to register and become editors.
Based on that research and editor suggestions, we started development on
to reduce the editor workload through better filters and simpler
moderation tools. We also continued to refactor our code, to support
millions of comments on a dedicated database cluster to be deployed in
coming months. Once this work is complete, we plan to release Article
Feedback v5 to 100% of the English Wikipedia in March, and to other
Wikimedia sites later this year. The German Wikipedia has already
started a pilot
to evaluate this tool, and a similar initiative is also under discussion
on the French Wikipedia.
is now in 'maintenance mode', following its release
on the English Wikipedia in September 2012. There was no significant
development activity on this project this month. Oliver Keyes has
completed a project to look at various ways of localizing Page Curation
to any and all wikis that want it: it is currently being reviewed by
Howie Fung to assess its feasibility.
Editor engagement experiments
Editor engagement experiments
In December, the Editor Engagement Experiments team launched a new test aimed at Onboarding new Wikipedians
This interface delivers an optimized task list immediately after sign
up, inviting those without an idea of how to get started to choose an
article and try their hand at editing. The related GettingStarted extension
was deployed mid-month and continued to evolve throughout the month, as
early quantitative and qualitative research was conducted.
To go along with the launch of GettingStarted and other experimentation, EventLogging
underwent heavy development, including the launch of a new Schema
namespace on Meta for defining the data collected in a public,
collaborative manner. We created production schemas for GettingStarted, account creation, mobile, and more.
Ori Livneh also reworked the format, transmission, and cleanliness of
data delivered to analysts and product managers, automatically
generating database tables from these schemas for incoming events.
Late in the month, the team collaborated with fundraising to reach out to donors and readers
as part of the annual fundraising campaign via email and a "Thank You"
banner which ran at the end of the year. In addition to introducing
millions of donors and readers to the Wikipedia editor community and
inviting them to join, this campaign helped the team establish an
experimental baseline for what a campaign to convert readers might look
In addition to the above launches, we continued development of the new account creation experience and Guided Tours
by Matt Flaschen, which will be launched in January 2013. Active
development was also begun by Ryan Faulkner and Dario Taraborelli on a user metrics API
The effort is threefold: to standardize user metrics in data analysis,
to build infrastructure to efficiently compute metrics for a large set
of users, and finally to expose those results via an API.
2012 Wikimedia fundraiser
2012 annual fundraiser continued in December and was a resounding
success. In addition to the ongoing maintenance required to operate the
fundraiser, the team helped to execute the Thank You campaign
and started to put into place new tools for auditing the fundraiser after its completion.
- The Mobile development and design team worked to finalize
contributory and other experimental editor-focused features on the Beta
site (uploads, editing, and watchlist functionality) in order to clear
the way for a full push on mobile uploads by March 2013. We also worked
to improve the reader and potential editor experience by introducing
features geared toward educating/engaging our users, such as a
human-readable last modified timestamp for articles and watchlist, and
thumbnail images to illustrate the watchlist view. Lastly, because of
the huge interest we generated in our Beta testing site, we created an
Alpha site to house very early work on contributory features, in order
not to disrupt the reading experience of our 100,000+ Beta users.
GeoData Storage & API
During December Max Semenik
continued work on GeoData, the extension directly responsible for
allowing us to easily store and retrieve GPS coordinates in our
databases. Max migrated the extension from implementation, to code
review, and finally deployment to the English Wikipedia. It will become
100% production-quality after a few more tweaks and fixes. After those
changes, we'll continue to roll out to the rest of the wikis. The
extension is one of the precursors to having the "near by" feature on
our mobile web site.
the month of December, Patrick Reilly, Dan Foy and the rest of the Zero
team launched Wikipedia Zero with a new partner, Orange Congo. They
resolved operational issues that prevented the team from accurately
recording traffic from the Opera browser. They also helped on-board
Brion Vibber to help in the interim while the team continues to look for
permanent members. The team is very excited about its upcoming launches
and will be announcing them as soon as possible.
The J2ME app is ready to launch pending contractual negotiations with carriers.
Wikipedia over SMS & USSD
The USSD service is ready to launch pending contractual negotiations.
Mobile QA team planned and began several projects in December, in
particular: an upcoming community test event for Mobile features;
support for MobileFrontend in beta labs; and significant new UI-level
automated tests in the gerrit queue.
We continued the bi-weekly deployment cycle, deploying MediaWiki 1.21wmf5
. We stopped deployments at the end of the month due to the holidays, restarting the 1.21wmf7
cycle on January 2.
not much to report for the month of December so far with Gerrit. New
repositories continue to be created, and the vast majority of active
parts of SVN have been marked read-only by now. Upgrading to a newer
version of Gerrit is still blocked on our LDAP problem with master, but
the patch to fix that is nearly complete. Mid-December, we extended the
Verified category to now allow +2 (in addition to +1 and -1), so Jenkins
has a wider range of statuses it can report.
Jan Gerber continued to refine the TimedMediaHandler extension, making the transcoding steps more robust.
Wikibase client extension was deployed to test2 in December. We plan
further deployment work in January, deploying to the Hungarian language
Wikipedia on January 14, 2013.
are ready to be served from Swift. They previously were for several
days, but the configuration had to be reverted to due random errors from
Swift. A new set of captchas are being tweaked for readability and are
served from Swift on the test wikis. Captchas are one of the last NFS
an assessment by Asher Feldman, Patrick Reilly and Tim Starling, the
RDB database patch was canceled. Instead, in the short term, a separate
vertically partitioned data cluster will be provided as a temporary
storage until a horizontally scalable architecture can be finalized.
Matthias Mullie is modifying the RDB-dependent ArticleFeedbackToolv5 to
remove that dependency through an abstraction layer. When a sharded or
horizontally scaled solution is provided, AFTv5's abstraction will be
migrated. An initial assessment of various non-MySQL alternatives for
using Aaron Schulz's JobQueue core patch in 1.20 is being done for Echo
Because of the time it takes to exhaust the Echo queues, it is written
to bypass the JobQueue through direct calls. Luke Welling is abstracting
the JobQueue for Redis, ZeroMQ, and others.
Admin tools development
has completed a preliminary prototype (deemed to be disposed of after
all the valuable data has been collected) in order to validate the
design and its core concepts, identify and explore possible issues and
test limits imposed by the platform. It will allow be used to explore
the usage of PHP 5.4's new features to ease the implementation of a
maintainable versioning system (the prototype abuses PHP's
implementation of namespaces in some cases, this is not meant to persist
in the final prototype but was rather a stress test), test
human-readable formatting for responses when called by specific clients,
and measure overhead added by the software abstraction. As a result,
some pain points and alternative routes have been identified on which
research work will be carried on in late January/beginning of February
2013, leading the team closer to a final implementation and related RFC.
The code will be available for a short time in a dedicated branch
at Wikia's app repository at Github
Security auditing and response
The team continued to respond to several reported vulnerabilities. A follow-up security review for Wikidata phase 2/3 was done.
project to support MobileFrontend in Beta labs continues. We intend for
Beta labs to become a test environment for the new git-deploy script
from the Operations team: this should be helpful in ongoing maintenance
of the environment
last Jenkins jobs (mostly Analytics ones) that were still using the
Gerrit Trigger plugin have been migrated to being triggered by Zuul.
Zuul now support triggering tests for whitelisted users. This has been
deployed to let trusted users have unit tests run whenever they send a
patchset in mediawiki/core (gerrit change 39310
). Volunteer Merlijn van Deen built a script to replicate our Jenkins installation
and worked on having extensions tests run on different MediaWiki branches.
After its announcement about the state of automated browser testing
on wikitech-l, the QA team continued to expand test coverage, improve
system and project documentation, and publicize and socialize the
project by means of the "Browser Testing" MediaWiki Group
Kraken (Analytics Cluster)
Hue/Hadoop authentication works, but group file access still needs to
be worked out. We've puppetized an Apache proxy for internal Kraken and
Hadoop web services, as well as udp2log kafka production and kafka
hadoop consumption. The event.gif log stream is being consumed into
Hadoop. We're attempting to use udp2log to import logs into Kafka and
Hadoop without packet loss, and backing up Hadoop service data files to
HDFS (e.g. Hue, Oozie, Hive, etc.).
major rework of Limn to use d3.js and Knockout.js is complete and will
be used for the next ReportCard. Dan Andreescu and David Schoonover are
working on graph editing and geospatial data visualization.
Zahn and Andre Klapper upgraded Bugzilla to the latest stable version
(4.2.4) which provides higher flexibility for displaying interface
elements, improved custom search, better JSON-RPC support and a solid
base for future improvements being considered. Andre continued to
improve the bug management documentation
Many bug reports that were previously closed as RESOLVED LATER were
retriaged and RESOLVED LATER was disabled for future use, and a large
number of previously unprioritized bug reports received a priority
setting. Furthermore, Andre looked after reports about CSS issues after
the MediaWiki 1.21wmf5
deployment and followed up by triaging, creating requested Bugzilla
components, etc. Several smaller regex fixes were deployed in Bugzilla
to fix automatic linking to Gerrit changesets. A "patch in gerrit" bug
status was discussed on wikitech-l
with the conclusion to wait for automatic notifications (comments) from
Gerrit into Bugzilla about patch status changes first (which is being
worked on by the Wikidata team).
Six MediaWiki candidates have been announced
for the Outreach Program for Women
(OPW). 4 of them are funded by the Wikimedia Foundation and 2 by Google
through an agreement with the GNOME Foundation, organizers of the
program. They will work as full-time interns under the supervision of
MediaWiki mentors between January and March 2013. We got 10 submissions
from about 25 people interested. The rather open and participatory selection process
we have defined for OPW will be used as a basis for future mentoring programs. We've also started matchmaking for the LevelUp
mentorships for the coming quarter.
published a project plan and timeline
for the consultation process started in October about how to improve
2-way communication between the technical and editing communities. He summarized
the results of the first phase and reached out to the wikitech-ambassadors list
to widen the consultation process by proxy. After consolidation and
prioritization of the results, the most feasible solution appeared to be
to grow a network of ambassadors
, which he started to organize on meta.
Unrelatedly, Guillaume made a list of 2012 tech blog posts to map tech blog activity by month & subdepartment (with priority activities listed separately). Work on setting up a Volunteer product manager program is also underway.
sorted out Social media
channels, and we now have @MediaWiki handles for identi.ca
. He published the community metrics November report
and a blog post
introducing this new activity.
Volunteer coordination and outreach
became official and the first proposals
are going through the approval process. As a side effect, a process for requesting regional mediawiki-themed mailing lists
has been created with mediawiki-india
as the first case. At least three Wikimedia-related talks have been accepted at FOSDEM
of the new user interface for Translate, as well as the translation
editor functionality, continued at full pace throughout the month of
December, with iterative feature development and user experience
improvements. Santhosh Thottingal and Niklas Laxström are leading
development and Pau Giner is focusing on optimizing user experience
elements. The team also released the latest version of the MediaWiki
Language Extension Bundle. Increased support for language variants,
alternate language codes were added to the Universal Language Selector.
Alolita Sharma continued to work with Red Hat's localization and
internationalization teams to evaluate localization data, translation
tools and internationalization tools and technologies.
More language input methods contributed by language communities were added to the jquery.ime library.
- Other news
- Pau Giner and Amir Aharoni participated in the Open Tech Chat this
month to talk about best practices in multilingual user testing and
internationalization. Amir Aharoni also participated in mentoring OPW
candidate Priyanka Nag for the new LevelUp program. Srikanth Lakshmanan
and Arun Ganesh’s tenure ended with the Language Engineering team in
The Kiwix project is funded and executed by Wikimedia CH.
- A new Kiwix 0.9rc2 was released. This version embeds our ZIM HTTP server kiwix-serve
for Windows, OSX and Linux. It is now integrated in the Kiwix UI,
allowing everyone to share Wikipedia on a LAN in two clicks . We have revamped our audience measurement tool, a solution that could be interesting for other projects using Mirrorbrain.
We continue at the same time to increase our ZIM production throughput
with 8 new Wikipedia ZIM files in December. December was also a month of
new records for Kiwix: for the first time, we have had more than 70.000 downloads a month and a Lead position for Education software at Sourceforge.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- New code and bugfixes have been deployed (with MediaWiki 1.21wmf5 and 1.21wmf6) and test2
now gets language links from Wikidata. Changes on Wikidata that concern
articles on test2 are shown in the recent changes of test2 as well. If
there are no problems, deployment on the Hungarian Wikipedia will happen on January 14, 2013. Other Wikipedia sites will follow.
- For the second phase of Wikidata, representation of values is the central focus. We published a draft and discussions
have started; we'd appreciate your feedback. Additionally, Denny
Vrandečić and Lydia Pintscher held IRC office hours; logs are available
in English and German.
- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.
Technical Communications Manager — Wikimedia Foundation