The report covering Wikimedia engineering activities in December 2012 is
We're also proposing a shorter, simpler and translatable version of this
report that does not assume specialized technical knowledge:
Below is the full HTML text of the report, as previously requested.
As always, feedback is appreciated about the usefulness of the report and
its summary, and on how to improve them.
Major news in December include:
- The launch of an alpha, opt-in version of the
the English Wikipedia, a project more
complex than it
- A research
the use of the Article Feedback feature;
- New metrics for the MediaWiki
- The start of the Outreach Program for
- Continued work to improve the
*Note: We're also proposing a shorter, simpler and translatable version of
does not assume specialized technical knowledge.
Personnel Work with us <https://wikimediafoundation.org/wiki/Work_with_us>
Are you looking to work for Wikimedia? We have a lot of hiring coming up,
and we really love talking to active community members about these roles.
- Software Engineer - Visual
- Software Engineer - Editor
- Software Engineer
- Software Engineer
- Software Developer General
- Git and Gerrit software development
- Release Manager <http://hire.jobvite.com/Jobvite/Job.aspx?j=oZrQWfwW>
- Software Engineer -
- Software Engineer
- Product Manager
- Director of User
- Visual Designer <http://hire.jobvite.com/Jobvite/Job.aspx?j=oomJWfw9>
- Operations Engineer<http://hire.jobvite.com/Jobvite/Job.aspx?j=ocLCWfwf>
- Operations Engineer/Database
- Site Reliability
- Matthew Flaschen joined the Wikimedia Features
as Features Engineer (
- Mike Wang joined the Operations team as part time Labs Ops Engineer
The Technical Operations team continued to work on completing the
outstanding migration tasks, and to ready our Ashburn infrastructure for
the big switchover day, i.e., the complete transition from the Tampa
datacenter to the one in Ashburn, on the week of January 22, 2013.In the
past few months, we've transitioned services from the Tampa datacenter to
the one in Ashburn, which now serves most of our traffic (about 90%).
However, application (MediaWiki), memcached and database systems are all
still running exclusively out of Tampa. We have been working to upgrade the
technologies and set up those systems at Ashburn, and we plan to perform
the switchover of those services from Tampa to Ashburn in the coming weeks.
This will provide us some assurance of a hot standby datacenter, should we
encounter an irrecoverable and lengthy outage in one of the main
Because December is when the annual Wikimedia fundraiser happens, the
Operations team usually makes fewer site infrastructure changes to mitigate
the risks of causing outages. Some of the lesser-risk work performed
include deploying the new Parsoid cluster to support the Visual Editor
project, rolling out doc.wikimedia.org
(our auto-generated puppet
documentation), using a new and unified SSL certificate for *wikipedia.organd *.
sites, and setting up a monitoring server and service in
Ashburn.Asher Feldman migrated one of the main production slave database
server (db59) for the English Wikipedia (enwiki) to MariaDB 5.5.28. He has
been testing 5.5.27 on the primary research slave, and on the current build
on a slave in Ashburn. Taking the times of 100% of all queries over regular
sample windows, the average query time across all enwiki slave queries is
about 8% faster with MariaDB compared to our production build of MySQL
5.1-fb. Some queries types are 10–15% faster, some are 3% slower, and
nothing looks aberrant beyond those bounds. Overall throughput as measured
by qps has generally been improved by 2–10%. Asher wouldn't draw any
conclusions from this data yet: more testing is needed to filter out noise,
but initial results are positive. The main reason for migrating to MariaDB
is not performance, but rather by the belief that it's in the Wikimedia
Foundation's and the open-source communities' interest to coalesce around
the MariaDB Foundation as the best route to ensuring a truly open and
well-supported future for MySQL-derived database technology.Mark Bergsma
and Faidon Liambotis have made tremendous progress in testing and deploying
Ceph in Ashburn. We are hopeful it will be robust and scalable.Ryan Lane
has been writing a new deployment system using git and Saltstack. Parsoid
is currently being deployed with this system, and MediaWiki is slated to
use it for its next major deployment.
There were no major changes on the fundraising infrastructure because of
the fundraiser itself. We ordered and received bastion hosts that we're in
the process of deploying. Monitoring got an overhaul and we're now sending
alerts to the fundraising technical staff or the technical operations team
depending on what triggered the alert.
*Data Dumps <https://www.mediawiki.org/wiki/WMF_Projects/Data_Dumps>*
A tool for dump users to set up interwiki links on their local mirrors
as well as documentation of the interwiki cdb file. Also, work with
WanSecurity on mirroring is moving forward: they now hold a current copy of
all 'other' files, including page views and Picture of the Year bundles,
among other things.
*Wikimedia Labs <https://www.mediawiki.org/wiki/Wikimedia_Labs>*
Labs came out of beta this month, following the opening of
self-registration. Another major change this month was the migration from
the shared NFS instance to per-project glusterfs volumes. A number of
smaller changes were made, including: the Addition of puppet documentation
links from classes and variables on the instance configuration pages; the
modification of the project filter to act as a table of contents; a split
of LDAP project groups into projects and POSIX groups; and the installation
of Saltstack on all instances to act as a guest agent. Features
retention: Editing tools
In December, the team deployed to the English
alpha version of the VisualEditor for editors to use and give feedback
on issues and priorities. The team's work focussed on ensuring that the
integration was reliable, and providing a dedicated tool for editors to
report problems with editing, and, after deployment, addressing the reports
and ideas from editors. The early version of the VisualEditor on
was also updated to use the new developments (as part of
The Parsoid <https://www.mediawiki.org/wiki/Parsoid> project reached a
major milestone with its first deployment to the English Wikipedia along
with the VisualEditor. This was a major test for Parsoid, as it needed to
handle the full range of arbitrary and complex existing wiki content
including templates, tables and extensions for the first time.
As witnessed by the clean edit
Parsoid passed this test with flying colors. This represents very hard work
by the team (Gabriel Wicke, Subramanya Sastry and Mark Holmquist) on automated
round-trip testing <http://parsoid.wmflabs.org:8001/> and the completion of
a selective serialization strategy just in time for the release.
After catching their breath, the team now has its sights on the next phase
in Parsoid development. This includes a longer-term strategy for the
integration of Parsoid and HTML DOM into MediaWiki, performance
improvements and better support for complex features of wikitext.
Editor engagement features
*Notifications <https://www.mediawiki.org/wiki/Echo_%28Notifications%29>* [
This month, the team continued to develop key features of the Notifications
project (code-named 'Echo'), and deployed a first experimental release on
. Fabrice Florin expanded feature
this release, and Vibha Bamba designed more components of the user
experience <https://www.mediawiki.org/wiki/Echo_User_Experience>. Ryan
Kaldari and Benny Situ developed improved notification flyouts and email
digests, as well as new notifications such as page links. Luke Welling
built an HTML email module, which will soon be available to other projects
as well. We plan to develop more features this month and deploy them for
new editors on the English Wikipedia in early 2013. Please help us
new features to
provide feedback and find bugs. We're also looking to hire
part of this project.
*Article feedback <https://www.mediawiki.org/wiki/Article_feedback>*
We made good
month. We completed a research
study <https://commons.wikimedia.org/wiki/File:AFT5_2012-Q4_report.pdf> on
the English Wikipedia, confirming that many readers use this feature and a
sizable number of them go on to register and become editors. Based on that
research and editor suggestions, we started development on new
reduce the editor workload through better filters and simpler
tools. We also continued to refactor our code, to support millions of
comments on a dedicated database cluster to be deployed in coming months.
Once this work is complete, we plan to release Article Feedback v5 to 100%
of the English Wikipedia in March, and to other Wikimedia sites later this
year. The German Wikipedia has already started a
evaluate this tool, and a similar initiative is also under
the French Wikipedia.
*Page Curation <https://www.mediawiki.org/wiki/Page_Curation>*
Page Curation <https://en.wikipedia.org/wiki/Wikipedia:Page_Curation> is
now in 'maintenance mode', following its
English Wikipedia in September 2012. There was no significant
development activity on this project this month. Oliver Keyes has completed
a project to look at various ways of localizing Page Curation to any and
all wikis that want it: it is currently being reviewed by Howie Fung to
assess its feasibility.
Editor engagement experiments
In December, the Editor Engagement Experiments team launched a new test
aimed at Onboarding new
This interface delivers an optimized task list immediately after sign up,
inviting those without an idea of how to get started to choose an article
and try their hand at editing. The related GettingStarted
deployed mid-month and continued to evolve throughout the month, as
early quantitative and qualitative research was conducted.
To go along with the launch of GettingStarted and other experimentation,
heavy development, including the launch of a new Schema namespace
on Meta for defining the data collected in a public, collaborative manner.
We created production schemas for
account creation <https://meta.wikimedia.org/wiki/Schema:AccountCreation>,
Ori Livneh also reworked the format, transmission, and cleanliness of data
delivered to analysts and product managers, automatically generating
database tables from these schemas for incoming events.
Late in the month, the team collaborated with fundraising to reach out to
donors and readers<https://meta.wikimedia.org/wiki/Research:Donor_engagement>as
part of the annual fundraising campaign via email and a "Thank You"
banner which ran at the end of the year. In addition to introducing
millions of donors and readers to the Wikipedia editor community and
inviting them to join, this campaign helped the team establish an
experimental baseline for what a campaign to convert readers might look
In addition to the above launches, we continued development of the new
account creation experience and Guided
Flaschen, which will be launched in January 2013. Active
development was also begun by Ryan Faulkner and Dario Taraborelli on a user
metrics API <https://meta.wikimedia.org/wiki/Research:Metrics>. The effort
is threefold: to standardize user metrics in data analysis, to build
infrastructure to efficiently compute metrics for a large set of users, and
finally to expose those results via an API.
The 2012 annual fundraiser continued in December and was a resounding
success. In addition to the ongoing maintenance required to operate the
fundraiser, the team helped to execute the Thank You
started to put into place new tools for auditing the fundraiser after
Mobile <https://www.mediawiki.org/wiki/Wikimedia_Mobile_engineering> The
Mobile development and design team worked to finalize contributory and
other experimental editor-focused features on the Beta site (uploads,
editing, and watchlist functionality) in order to clear the way for a full
push on mobile uploads by March 2013. We also worked to improve the reader
and potential editor experience by introducing features geared toward
educating/engaging our users, such as a human-readable last modified
timestamp for articles and watchlist, and thumbnail images to illustrate
the watchlist view. Lastly, because of the huge interest we generated in
our Beta testing site, we created an Alpha site to house very early work on
contributory features, in order not to disrupt the reading experience of
our 100,000+ Beta users.
*GeoData Storage & API<https://www.mediawiki.org/wiki/GeoData_Storage_%26_API>
During December Max Semenik
<https://www.mediawiki.org/wiki/User:MaxSem>continued work on GeoData,
the extension directly responsible for allowing
us to easily store and retrieve GPS coordinates in our databases. Max
migrated the extension from implementation, to code review, and finally
deployment to the English Wikipedia. It will become 100% production-quality
after a few more tweaks and fixes. After those changes, we'll continue to
roll out to the rest of the wikis. The extension is one of the precursors
to having the "near by" feature on our mobile web site.
*Wikipedia Zero <https://www.mediawiki.org/wiki/Wikipedia_Zero>*
During the month of December, Patrick Reilly, Dan Foy and the rest of the
Zero team launched Wikipedia Zero with a new partner, Orange Congo. They
resolved operational issues that prevented the team from accurately
recording traffic from the Opera browser. They also helped on-board Brion
Vibber to help in the interim while the team continues to look for
permanent members. The team is very excited about its upcoming
will be announcing them as soon as possible.
*J2ME App <https://www.mediawiki.org/wiki/MobileFrontend/J2ME_app>*
The J2ME app is ready to launch pending contractual negotiations with
*Wikipedia over SMS &
The USSD service is ready to launch pending contractual negotiations.
*Mobile QA <https://www.mediawiki.org/wiki/Mobile_QA>*
The Mobile QA team planned and began several projects in December, in
particular: an upcoming community test event for Mobile features; support
for MobileFrontend in beta labs; and significant new UI-level automated
tests in the gerrit queue.
*MediaWiki 1.21 <https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap>* [
We continued the bi-weekly deployment cycle, deploying MediaWiki
1.21wmf6 <https://www.mediawiki.org/wiki/MediaWiki_1.21/wmf6>. We stopped
deployments at the end of the month due to the holidays, restarting the
1.21wmf7 <https://www.mediawiki.org/wiki/MediaWiki_1.21/wmf7> cycle on
*Git conversion <https://www.mediawiki.org/wiki/Git/Conversion>*
There's not much to report for the month of December so far with Gerrit.
New repositories continue to be created, and the vast majority of active
parts of SVN have been marked read-only by now. Upgrading to a newer
version of Gerrit is still blocked on our LDAP problem with master, but the
patch to fix that is nearly complete. Mid-December, we extended the
Verified category to now allow +2 (in addition to +1 and -1), so Jenkins
has a wider range of statuses it can report.
Jan Gerber continued to refine the TimedMediaHandler extension, making the
transcoding steps more robust.
*Wikidata deployment <https://www.mediawiki.org/wiki/Wikidata_deployment>* [
The Wikibase client extension was deployed to test2 in December. We plan
further deployment work in January, deploying to the Hungarian language
Wikipedia on January 14, 2013.
Captchas are ready to be served from Swift. They previously were for
several days, but the configuration had to be reverted to due random errors
from Swift. A new set of captchas are being tweaked for readability and are
served from Swift on the test wikis. Captchas are one of the last NFS
*Site performance <https://www.mediawiki.org/wiki/Site_performance>*
After an assessment by Asher Feldman, Patrick Reilly and Tim Starling, the
RDB database patch was canceled. Instead, in the short term, a separate
vertically partitioned data cluster will be provided as a temporary storage
until a horizontally scalable architecture can be finalized. Matthias
Mullie is modifying the RDB-dependent ArticleFeedbackToolv5 to remove that
dependency through an abstraction layer. When a sharded or horizontally
scaled solution is provided, AFTv5's abstraction will be migrated. An
initial assessment of various non-MySQL alternatives for using Aaron
Schulz's JobQueue core patch in 1.20 is being done for
Because of the time it takes to exhaust the Echo queues, it is written to
bypass the JobQueue through direct calls. Luke Welling is abstracting the
JobQueue for Redis, ZeroMQ, and others.
*Admin tools development<https://www.mediawiki.org/wiki/Admin_tools_development>
The initial code was committed for interface for Stewards to mass-lock user
For global AbuseFilters, a permission for global rule-writing was merged
and the initial code for using WikiSets in the rules was written. Initial
code committed for renaming CentralAuth user
*REST proposal <https://www.mediawiki.org/wiki/API/REST_proposal>*
Wikia has completed a preliminary prototype (deemed to be disposed of after
all the valuable data has been collected) in order to validate the design
and its core concepts, identify and explore possible issues and test limits
imposed by the platform. It will allow be used to explore the usage of PHP
5.4's new features to ease the implementation of a maintainable versioning
system (the prototype abuses PHP's implementation of namespaces in some
cases, this is not meant to persist in the final prototype but was rather a
stress test), test human-readable formatting for responses when called by
specific clients, and measure overhead added by the software abstraction.
As a result, some pain points and alternative routes have been identified
on which research work will be carried on in late January/beginning of
February 2013, leading the team closer to a final implementation and
related RFC. The code will be available for a short time in a dedicated
branch <https://github.com/Wikia/app/tree/lox-REST-prototype> at Wikia's
app repository at Github <https://github.com/Wikia/app>.
*Security auditing and
The team continued to respond to several reported vulnerabilities. A
follow-up security review for Wikidata phase 2/3 was done.
*Beta cluster <https://www.mediawiki.org/wiki/Beta_cluster>*
The project to support MobileFrontend in Beta labs continues. We intend for
Beta labs to become a test environment for the new git-deploy script from
the Operations team: this should be helpful in ongoing maintenance of the
The last Jenkins jobs (mostly Analytics ones) that were still using the
Gerrit Trigger plugin have been migrated to being triggered by Zuul. Zuul
now support triggering tests for whitelisted users. This has been deployed
to let trusted users have unit tests run whenever they send a patchset in
mediawiki/core (gerrit change
Volunteer Merlijn van Deen built a script to replicate our Jenkins
worked on having extensions tests run on different MediaWiki branches.
*Browser testing <https://www.mediawiki.org/wiki/Browser_testing>*
After its announcement about the state of automated browser
wikitech-l, the QA team continued to expand test coverage, improve
system and project documentation, and publicize and socialize the project
by means of the "Browser Testing" MediaWiki
*Kraken (Analytics Cluster)<https://www.mediawiki.org/wiki/Analytics/Kraken>
LDAP Hue/Hadoop authentication works, but group file access still needs to
be worked out. We've puppetized an Apache proxy for internal Kraken and
Hadoop web services, as well as udp2log kafka production and kafka hadoop
consumption. The event.gif log stream is being consumed into Hadoop. We're
attempting to use udp2log to import logs into Kafka and Hadoop without
packet loss, and backing up Hadoop service data files to HDFS (e.g. Hue,
Oozie, Hive, etc.).
A major rework of Limn to use d3.js and Knockout.js is complete and will be
used for the next ReportCard. Dan Andreescu and David Schoonover are
working on graph editing and geospatial data visualization.
Engineering community team
*Bug management <https://www.mediawiki.org/wiki/Bug_management>*
Daniel Zahn and Andre Klapper upgraded Bugzilla to the latest stable
version (4.2.4) which provides higher flexibility for displaying interface
elements, improved custom search, better JSON-RPC support and a solid base
for future improvements being considered. Andre continued to improve the bug
management documentation <https://www.mediawiki.org/wiki/Bug_management>.
Many bug reports that were previously closed as RESOLVED LATER were
retriaged and RESOLVED LATER was disabled for future use, and a large
number of previously unprioritized bug reports received a priority setting.
Furthermore, Andre looked after reports about CSS issues after the MediaWiki
1.21wmf5 <https://www.mediawiki.org/wiki/MediaWiki_1.21/wmf5> deployment
and followed up by triaging, creating requested Bugzilla components, etc.
Several smaller regex fixes were deployed in Bugzilla to fix automatic
linking to Gerrit changesets. A "patch in gerrit" bug status was discussed
the conclusion to wait for automatic notifications (comments) from
Gerrit into Bugzilla about patch status changes first (which is being
worked on by the Wikidata team).
*Mentorship programs <https://www.mediawiki.org/wiki/Mentorship_programs>* [
Six MediaWiki candidates have been
Program for Women<https://www.mediawiki.org/wiki/Outreach_Program_for_Women>(OPW)
4 of them are funded by the Wikimedia Foundation and 2 by Google
through an agreement with the GNOME Foundation, organizers of the program.
They will work as full-time interns under the supervision of MediaWiki
mentors between January and March 2013. We got 10 submissions from about 25
people interested. The rather open and participatory selection
have defined for OPW will be used as a basis for future mentoring
programs. We've also started matchmaking for the
for the coming quarter.
Guillaume Paumier <https://www.mediawiki.org/wiki/User:Guillom>
published a project
the consultation process started in October about how to improve 2-way
communication between the technical and editing communities. He
results of the first phase and reached out to the wikitech-ambassadors
widen the consultation process by proxy. After consolidation and
prioritization of the results, the most feasible solution appeared to be to
grow a network of
which he started to organize on meta.
Unrelatedly, Guillaume made a list of 2012 tech blog
map tech blog activity by month & subdepartment (with priority
activities listed separately). Work on setting up a Volunteer product
is also underway.
Quim Gil <https://www.mediawiki.org/wiki/User:Qgil> sorted out Social
annels, and we now
have @MediaWiki handles for
Facebook <https://www.facebook.com/MediaWikiProject> and
He published the community metrics November
this new activity.
*Volunteer coordination and
MediaWiki Groups <https://www.mediawiki.org/wiki/Groups> became official
and the first proposals
<https://www.mediawiki.org/wiki/Groups/Proposals>are going through the
approval process. As a side effect, a process for
requesting regional mediawiki-themed mailing
the first case. At least three Wikimedia-related talks have been
accepted at FOSDEM <https://www.mediawiki.org/wiki/Events/FOSDEM>.
*Language tools <https://www.mediawiki.org/wiki/Language_tools>*
Development of the new user interface for Translate, as well as the
translation editor functionality, continued at full pace throughout the
month of December, with iterative feature development and user experience
improvements. Santhosh Thottingal and Niklas Laxström are leading
development and Pau Giner is focusing on optimizing user experience
elements. The team also released the latest version of the MediaWiki
Language Extension Bundle. Increased support for language variants,
alternate language codes were added to the Universal Language Selector.
Alolita Sharma continued to work with Red Hat's localization and
internationalization teams to evaluate localization data, translation tools
and internationalization tools and technologies.
More language input methods contributed by language communities were added
to the jquery.ime library.
Other newsPau Giner and Amir Aharoni participated in the Open Tech Chat
this month to talk about best practices in multilingual user testing and
internationalization. Amir Aharoni also participated in mentoring
Priyanka Nag for the new LevelUp program. Srikanth Lakshmanan and
Arun Ganesh’s tenure ended with the Language Engineering team in December.
*The Kiwix project is funded and executed by Wikimedia
A new Kiwix 0.9rc2 <http://changelog.kiwix.org> was released. This version
embeds our ZIM HTTP server *kiwix-serve* for Windows, OSX and Linux. It is
now integrated in the Kiwix UI, allowing everyone to share Wikipedia on a
LAN in two clicks . We have revamped our audience measurement
a solution that could be interesting for other projects using
We continue at the same time to increase our ZIM production throughput with
8 new Wikipedia ZIM files in December. December was also a month of new
records for Kiwix: for the first time, we have had more than 70.000
Education software at Sourceforge.
*The Wikidata project is funded and executed by Wikimedia
New code and bugfixes have been deployed (with MediaWiki
and test2 <http://test2.wikipedia.org> now gets language links from
Wikidata. Changes on Wikidata that concern articles on test2 are shown in
the recent changes of test2 as well. If there are no problems, deployment
on the Hungarian
happen on January 14, 2013. Other Wikipedia sites will follow. For
the second phase of Wikidata, representation of values is the central
focus. We published a
started; we'd appreciate your feedback. Additionally, Denny Vrandečić
and Lydia Pintscher held IRC office hours; logs are available in
. Future The engineering management team continues to update the *
Deployments <http://wikitech.wikimedia.org/view/Deployments>* page weekly,
providing up-to-date information on the upcoming deployments to Wikimedia
sites, as well as the *engineering
*, listing ongoing and future Wikimedia engineering efforts.
Technical Communications Manager — Wikimedia Foundation