Hi,
The report covering Wikimedia engineering activities in January 2013 is now available.
Wiki version: https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/January
Blog version: https://blog.wikimedia.org/2013/02/07/engineering-january-2013-report/
We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge:
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/January/summary
Below is the full HTML text of the report, as previously requested.
As always, feedback is appreciated about the usefulness of the report and its summary, and on how to improve them.
------------------------------------------------------------------
Major news in January include:
Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Personnel
Are you looking to work for Wikimedia? We have a lot of hiring coming
up, and we really love talking to active community members about these
roles.
Announcements
Technical Operations
Production Site Switchover
- The Wikimedia Foundation switched over its primary data center from Tampa, Florida to Ashburn, Virginia on January 22. Given the scale and complexity of the migration, we scheduled three 8-hour windows to perform the migration, but we were able to complete it
on the first attempt. Because the switchover involved, among other
things, moving over the master databases from Tampa to Ashburn, the site
was set to 'read-only' mode for about 32 minutes. During that period,
the site was available but no new contents were created, edited or
uploaded. As expected, there was some minor fallout of the migration,
mostly due to configuration changes, but they were quickly contained by
the Engineering and Operation teams.
- With this migration, Tampa data center will now be our fail-over
site and we plan to perform site fail-over tests every few months. There
are remaining small non-core applications still using Tampa as the
primary site, such as RT, etherpad and Bugzilla. They too will be
migrated in the coming months.
Site infrastructure
- One of the main concerns of the migration was serving traffic from
the new data center using empty memcached servers: the spike in load on
the Apache and database servers could have been disastrous to the site.
To address it, Tim Starling improved on the single instance
implementation of 'Parser Cache' persistent store in Tampa (to 3 sharded
instances), and Asher Feldman built and replicated the databases across
the 2 data centers.
- Another improvement, done by Asher and Peter Youngmeister, was the implementation of MHA
(Master High Availability) on our MySQL clusters. Its primary objective
is to automate the promotion of a slave database in a master database
fail-over scenario and to to reduce downtime, without suffering from
replication integrity problems, without prolong database latency, and
without changing existing deployments.
- Faidon Liambotis and Mark Bergsma continued to work on the Ceph file
object store. With Domas Mituzas' help, they identified a performance
issue with the RAID card which caused severe read/write latency on the
Ceph cluster. Faidon has confirmed with the vendor that it is a known
problem and no fix is available yet. We have ordered and substituted
those RAID cards, and test results seem to indicate that the performance
issue is solved.
Fundraising
- Fundraising bastion hosts were deployed in the Ashburn and Tampa
data centers. We also tweaked and tuned central logging and monitoring,
and converted the remaining fundraising MuISAM tables to InnoDB, which
should fix dump-induced replication lag.
Data Dumps
- This month, we had a look at the process of using the XML dumps to
create a local copy of a Wikimedia site: it turned out to be painful and
cumbersome at best, and unfathomable for the end-user in the worst
case. As part of an attempt to improve this situation, there is now a new experimental tool
available for *nix platforms, for generating MySQL tables from the XML
stub and page content files. It is intended to read input files from
various versions of MediaWiki and generate output for the version the
user wants. Testing and feedback is encouraged.
Wikimedia Labs
- In January, we had a number of performance and usability
improvements. Three compute nodes were added into the pmtpa zone. Alex
Monk added Echo notification support to labsconsole, passwordless sudo
is now the default for projects, and shell requests are created
automatically on account creation. The sysadmin and netadmin roles have
been combined into a single projectadmin role. Glusterfs was upgraded to
handle a memory leak, but unfortunately a new bug has been introduced
that caused some instability in project storage. Work is ongoing to
improve the project storage situation.
Editor retention: Editing tools
VisualEditor
In January, the team worked primarily on reviewing and cleaning-up the code
deployed
in December. They spent time with their colleagues in the Parsoid team
planning the next phase of development, which is aimed at making the
VisualEditor the default editor for all Wikipedias from July 2013. The
alpha version of the VisualEditor on
mediawiki.org and the English
Wikipedia was updated twice (
1.21-wmf7 and
-wmf8),
fixing a number of bugs reported by the community and making some
adjustments to the link inspector's functionality based on feedback.
Parsoid
In January, the
Parsoid
team did some Spring cleaning and bug fixing. The serialization
subsystem was overhauled: it now features simpler and more robust
separator handling. Selective serialization was rewritten to deal with
content deletions. It also features DOM diff-based change detection that
does not rely on client-side change marking. Support for non-English
wikis and local configurations was also improved a lot, and will likely
stabilize in the next weeks. The team also discussed and documented the
longer-term Parsoid / MediaWiki strategy in the
Parsoid roadmap.
The performance-oriented C++ port was deprioritized in favor of
DOM-based performance improvements and HTML storage. The basic idea
behind storing (close to) fully processed HTML is to speed things up by
doing no significant parsing on page view at all. In the longer term,
VisualEditor-only wikis can avoid a dependency on Parsoid by switching
to HTML storage exclusively. Overall, the plan is to leverage the
Parsoid-generated HTML/RDFa DOM format inside MediaWiki core to enable
better performance and editing capabilities in the future.
Editor engagement features
Notifications
This month, we stepped up development on the Notifications project
Echo
and updated our first experimental release on
mediawiki.org. Ryan
Kaldari and Benny Situ improved the user experience for core features
such as the badge, fly-out, all-notifications page and email
notifications, and started developing new features such as
bundling,
dismiss and
web preferences. Luke Welling completed work on HTML email and started development of a more robust
job queue. Fabrice Florin led discussions about the Echo product plan, and
new features and
notifications
under consideration, while Vibha Bamba designed new components of the
user experience. We plan to develop some of these features and
notifications in coming weeks, and are aiming for a first release on the
English Wikipedia by the end of March; in the meantime, you can
try the current version on
mediawiki.org. We are also recruiting for a
software engineer to join our team and work with us on this and other editor engagement projects.
Flow
Flow entered the product design phase in early January.
OPW intern Kim Schoonover began
user research
regarding how user-to-user talk pages are handled, and collected data
about the difficulties that new (and existing) users have when using
them. Engineering discussions started about potential back-end and
scaling difficulties, the possible use of Wikidata's
ContentHandler,
and the evaluation of Wikia's MessageWall. A plan for community
engagement was proposed and accepted, with a consultation about the
problems faced planned for early February, with experienced and newer
users alike.
Article feedback
This month, our team updated
Article Feedback v5
and discussed its release with communities in the English, French and
German Wikipedias. Developer Matthias Mullie completed a major code
refactoring, which is now being reviewed. He also developed a final set
of
new features, such as simpler moderation tools and better filters, to be tested next month. Dario Taraborelli and Aaron Halfaker posted a
feedback evaluation report, which suggests that about 39% of the feedback collected in their study can be used to improve articles (see also
their other study results). Oliver Keyes responded to community questions in a
request for comments
about future deployments on the English Wikipedia, with a final
decision expected next month. Fabrice Florin led product planning and
discussed a possible deployment on the
French Wikipedia and with the
German Wikipedia, currently evaluating the tool in
an ongoing pilot
with a vote expected in May. Once our development is complete and
communities reach their decisions for each project, we expect to release
Article Feedback v5 on a range of Wikimedia sites in coming months.
Editor engagement experiments
Editor engagement experiments
In January, the Editor Engagement Experiments team ("E3") planned its
goals for the quarter, which ends in March. We also made progress on the following projects which are included in that plan.
First up, we launched guided tours
on the English Wikipedia, including a test tour to demonstrate the
capabilities of the extension, and a tour associated with the "onboarding new Wikipedians" (aka GettingStarted) project. In addition to tours created by the team, the extension
supports community-created tours. Note that unlike many other projects
by the E3 team, guided tours are planned as a permanent addition to
Wikipedia, with each tour implementation considered to be experimental.
(For example: the "getting started" tour will be delivered via a split
A/B test.)
While building guided tours, the team also A/B tested the Getting Started
landing page and task list, measuring the effect it had on driving new
contributions. Several rounds of analysis were completed and published
on Meta (round 1, round 2),
with the conclusion that the onboarding experience is leading to small
but statistically significant increases in new English Wikipedians
attempting to edit, as well as saving their first edit. In addition to
measuring the effects of the guided tour associated with this project,
immediate plans are to redesign the landing page and add additional task
types, to entice more new contributors.
Work also continued on refining the reliability and precision of the data collected from
EventLogging.
In particular, we migrated EventLogging to a dedicated database, and
began collecting server-side events in addition to client-side, to
support work such as measuring account creations on desktop and mobile.
January also saw the heavy use of the new
User Metrics API, in order to complete cohort analysis of onboarding users and for metrics reported at the
Board presentation
on the Foundation's year-to-date progress. Development of the API
continues, and a public announcement is expected for early March. Last
but not least, a call was put out for a part-time
Technical Writer to work on documenting both of these pieces of infrastructure.
Support
2012 Wikimedia fundraiser
January
marks the official end of the 2012 fundraiser. The team spent the
entirety of the month cleaning up and recovering from the very
successful months of November and December, auditing the donations, and
writing tools that will help the team run continuous auditing in the
future.
Web
GeoData Storage & API
After its soft launch in December, GeoData was
officially announced
this month. Work on improvements and bug fixing continues. The
Special:Nearby page, which has been deployed to an experimental version
of the site, represents the first major use of this feature on mobile
projects. We hope to use it to help contributors identify articles in
need of photos.
Mobile QA
The push to get
MobileFrontend up and running on Beta Labs is well underway. We've also added
test cases for Wikipedia Zero and we are planning a community test event for Mobile Upload and Commons in February.
Mobile Web Photo Upload
This
month, the mobile web team finished up work on the watchlist feature
and kicked off a 3-month sprint on photo uploads. The focus in January
was on developing basic uploading infrastructure: uploading images to
Commons under a single Creative Commons license. We also built out the
UX/UI design for a call to action on articles lacking images in the lead
section. Through this workflow, users can upload an image to Commons
and add a thumbnail of the image to the appropriate article on their
local Wikipedia or sister project, in one simple step. We also developed
a mobile uploads page where contributors can see their recent uploads
and potentially donate more images from their mobile device to Commons.
These features are currently live on the Beta mobile site and are set to
be released to the full mobile site in February.
Apps
Commons App
January
marked the first month of the Apps team's existence. Yuvaraj Pandian
has started work with Brion Vibber on iOS and Android-based apps to
upload photos to Commons. Both platforms are being developed
concurrently and will have feature parity. Shankar Narayan joined us and
and will be supporting the team for all design needs. While the first
iteration of the Commons App isn't scheduled to finish until February
8th, the team has already created two skeleton apps that can upload,
share and show the user's contributions. The team will be spending their
next iteration tweaking workflows and styling the app. We also released
new versions of the Wikipedia app on iOS and Android in order to bring
it into compliance for legal privacy/disclaimer issues.
Partners
Wikipedia Zero
During January, Wikimedia was
awarded a grant
in the Knight News Challenge for our work in expanding Wikimedia mobile
projects. Part of this grant will be used for Wikipedia Zero and the
SMS/USSD projects to improve access to knowledge in the developing world. In addition, we've
partnered with VimpelCom to provide Wikipedia Zero to at least 100 million additional customers this year.
J2ME App
During
January, we've begun to explore ways to reduce the memory and processor
requirements of our J2ME app, to increase the number of phones that can
use this application.
Wikipedia over SMS & USSD
We
are finishing work on capturing the metrics from the SMS server to
learn usage numbers and determine how many sessions are completed.
MediaWiki Core
MediaWiki 1.21
MediaWiki
1.21wmf7 and
1.21wmf8
were deployed in January on a modified schedule, due to holidays and
because of the data center migration. Deployments have returned to their
usual fortnightly schedule.
Git conversion
The
ExtensionDistributor was rewritten in early January. While this was primarily done to support the
data center migration,
this was the first time ExtensionDistributor had received any
signification attention since the migration to Git. The new version now
utilizes the Github API to generate extension snapshots. We hope that
the new version will be more reliable for users. SVN-based extensions
are no longer supported, but this is not expected to impact many users
since these extensions are largely unmaintained (all popular and active
extensions have long since moved to Gerrit). As always, these extensions
will remain in SVN should anyone still want the code.
TimedMediaHandler
Jan
Gerber continues bugfixing and refining TimedMediaHandler, mainly
focusing on operational improvements to make more efficient use of our
server infrastructure.
Wikidata deployment
Sam
Reed helped the Wikidata deployment, deploying the Wikibase Client
extension to Wikipedia in Hungarian, Hebrew, and Italian. Chris Steipp
reviewed the Wikidata team's work to extend AbuseFilter for use with
structured data. Aaron Schulz worked with Daniel Kinzler on job queue
improvements.
Wikivoyage migration
Wikivoyage
officially launched
on January 15. Most of the Wikimedia Foundation's involvement was
completed in November, but some minor bugfixing was done in support of
the official launch.
SwiftMedia
NFS
for uploads/thumbnails has been unmounted from all Apache servers and
the NFS back-end configuration was removed from MediaWiki; all files now
only use Swift. A workaround has been added for the Swift back-end
class when used with Ceph, so that temporary URLs can be used (for
making video thumbnails for example). A Python script to copy files into
Ceph has been run and is being worked on. Various issues have been
reported in Ceph's bug tracker and are being looked at by the
developers.
Lua scripting
Lua
development was put on hold through the Ashburn data center migration.
We've now resumed work on Lua, with Brad Jorsch and Tim Starling making
more functions available in Lua that are currently already available in
template parser functions.
Site performance
A
patch to allow moving the DB job queue to another cluster is under
review. An experimental redis-based job queue patch also exists in
gerrit.
Incremental architectural improvements
Code was merged to support more complex data structures (lists, sets) in memcached (with atomic updates).
Admin tools development
The team mainly focused this month on improving the AbuseFilter extension, which is now working on the
Wikidata site after support was added for other content types (as defined using
ContentHandler).
There was some significant work done on blocking abusive proxies and
abuse limits, and some additional progress made on global AbuseFilters,
user renaming and the interface for Stewards to
mass-lock user accounts.
Security auditing and response
The
team continued to respond to reported vulnerabilities, began a security
review of fundraising extensions, and continued reviews of Wikidata
features.
Quality assurance
Quality Assurance
We started to schedule opportunities for
community testing events. Echo, AFTv5, and VisualEditor are all current candidates for testing. A week-long focus on VisualEditor's
support for non-Latin characters uncovered at least one major issue causing data loss, and another one with tool-assisted Chinese input.
Beta cluster
The main use for the Beta Cluster in January was to test
git-deploy.
Zeljko Filipin continues to run regular tests there. Antoine Musso, Max
Semenik, and Andrew Bogott are setting up MobileFrontend to run on Beta
for testing purposes.
Continuous integration
Antoine
Musso worked with several MediaWiki extension authors to ensure that
the unit tests for those extensions are run by Jenkins and that they
work. He hopes to have all extensions that run on the Wikimedia
production cluster fully operational by the end of February. Antoine
also integrated
PHP CodeSniffer into our automated test runs.
Browser testing
Architecture
and configuration for browser testing are now stable, and the focus
shifted to increasing test coverage by making existing tests more
extensive and covering new features. An example is the
Math extension,
which was briefly broken after the data center migration. The team has
also instituted a weekly pair-programming session every Friday.
Analytics
Limn
The
team made performance improvements and added new visualizations. Bar,
line, and geo plots can now be built ad hoc from arbitrary data. Evan
Rosen's Grantmaking and Programs dashboard was migrated to this new
version of Limn. Current work includes a collaboration with the E3 team
to provide visualizations, and development of a MediaWiki extension that
will allow creation and editing of graphs.
Bug management
This month, a first
bugday was held,
targeting bug reports which had not seen any changes for more than one
year, resulting in about 30 tickets being updated. In addition, some
cleanup work (decreasing the number of unprioritized bug reports and
going through open reports in "ASSIGNED" status for more than a year)
took place. Andre Klapper worked on
small Bugzilla code changes and published initial information on
Bugzilla usage per development team. Community members were invited to join the
MediaWiki Group Bug Squad. Furthermore, some problems due to
data center migration
were investigated, and it was discussed how to improve interaction on
Bugzilla tickets that need handling by the Operations team (who mostly
prefers to use the
RT bugtracker instead).
Mentorship programs
Six
Outreach Program for Women interns started on January 3rd and will work full time until April.
Mariya is working on a
discussion among third-party MediaWiki users.
Valerie has completed the
Bug Squad group proposal and a first Bug Day.
Priyanka created a
script and plans to move to
Git.
Sucheta is on schedule following her
project plan.
Kim is learning about
Flow and the basics of interactive design as indicated by her mentor.
Teresa has completed
a solid base
for her extension and is working on the main functionality. She hit a
snag with her work environment this week, but is still on track with her
proposed timeline. The
Google Summer of Code 2013 page was created, a
pre-planning discussion started on wikitech-l, and
LevelUp matchmaking for the first quarter of 2013 is nearly done.
Technical communications
Guillaume Paumier provided
communications support to the engineering team, notably around the
data center migration and associated
banners,
notices &
translations. He started to organize and clean up the MediaWiki version pages (like
MediaWiki 1.21/wmf7) to make them more useful for
tech ambassadors,
by highlighting the most important changes, improving translatability
and adding navigation. He also prepared and organized translations for
the
How to report a bug and
How to contribute pages, to facilitate the involvement of volunteers who don't necessarily communicate in English. Last, he created a
Project:Calendar to consolidate and centralize announcements for all
events, to make opportunities for participation more visible. Events around a particular topic (like
QA, testing and bugs) can still be selectively transcluded, using
Labeled Section Transclusion.
Volunteer coordination and outreach
The
MediaWiki groups for
Promotion and
San Francisco were officially approved by the
Wikimedia Affiliations Committee, and are the first
Wikimedia User Groups created. We helped the
Editor Engagement team organize a sprint to
test Echo, but our plans to collaborate further with the Editor Engagement and
Mobile teams were delayed; Quim Gil proposed
a different approach combining regular, time-based
QA and
bug management activities, in the form of
QA weekly goals. Two such events (
non-Latin character testing in VisualEditor and
a review of old bugs) happened in January, and more are scheduled. Heavy work was done with Chris McMahon to improve the
top QA pages, although
some problems remain.
Template:MediaWiki News is now manually synced with
social media, bringing fresh updates to the
mediawiki.org homepage and
News page. Quim also took the lead on organizing the
Wikipedia Engineering Meetup on January 17th. He prepared an
intro to MediaWiki & Wikimedia tech contributions, which he tested at
FOSDEM, designed to be reused by other presenters. Last, we confirmed that technical projects are eligible to
Individual Engagement Grants.
Language tools
Development
of the new user interface for Translate, as well as the translation
editor functionality, continued throughout the month of January. Focus
was on back-end work and extending the WebAPI to support the remaining
features which are needed to reach feature parity with current editor.
The MediaWiki Language Extension Bundle 2013.01 was released. Universal
Language Selector was deployed with limited features to a selection of
Wikimedia sites projects using the Translate extension. Collaboration
projects also continue with Red Hat's language technologies teams, with
an upcoming work sprint to complete several projects extending
internationalization support for Indic languages. Runa Bhattacharjee
kicked off the Language coverage matrix, an attempt to compile a
snapshot of our internationalization tools coverage per language for 300
languages.
Milkshake
More input methods were added to jQuery.IME, and bugs were fixed in jQuery.ULS.
The Kiwix project is funded and executed by Wikimedia CH.
- We have adapted the kiwix-plug script to Tonidoplug2, a device cheaper than the Dreamplug. Kiwix was elected by Sourceforge users as February's Project of the Month and an interview of Emmanuel Engelhart was published. For the first time, Kiwix has reached 100.000 downloads a month in January.
- Beside Kiwix, the openZIM website
was revamped and simplified for better readability. The openZIM bug
tracker and source code management were migrated to the Wikimedia
infrastructure (Bugzilla and Git).
The Wikidata project is funded and executed by Wikimedia Deutschland.
- January has been an exciting month for Wikidata. The deployment on the first Wikipedia sites (Hungarian, Hebrew and Italian)
was completed. At the same time, work has continued on the user
interface and back-end for statements, the core part of Wikidata's
second phase. This will enable users to enter information like the
children of a given person or a link to their portrait on Wikimedia
Commons. These features can already be tested on the demo system.
We've also worked on making AbuseFilter work with Wikidata, and wrote a
new mechanism to distribute changes to the clients (Wikipedia) so they
can show Wikidata changes in their RecentChanges. We made progress on
using Solr for search and rewrote the draft for the inclusion syntax to be much simpler. This is the syntax that editors will use to include data from Wikidata in Wikipedia. A manual for using Pywikipedia on Wikidata was written as well.
- If you want to code on Wikibase, the software powering Wikidata, have a look at the outstanding bugs and tasks.
Future
- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.
--
Guillaume Paumier
Technical Communications Manager — Wikimedia Foundation
https://donate.wikimedia.org