Hi,
The report covering Wikimedia engineering activities in April 2013 is now available.
Wiki version: https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/April
Blog version: https://blog.wikimedia.org/2013/05/02/wikimedia-engineering-april-2013-report/
We're
also proposing a shorter, simpler and translatable version of this
report that does not assume specialized technical knowledge:
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/April/summary
Below is the full HTML text of the report.
As always, feedback is appreciated on the usefulness of the report and its summary, and on how to improve them.
------------------------------------------------------------------
Major news in April include:
Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Personnel
Are you looking to work for Wikimedia? We have a lot of hiring coming
up, and we really love talking to active community members about these
roles.
Announcements
- Monte Hurd joined the Mobile engineering group as Software Engineer in the Apps team (announcement).
- Brandon Black joined the Operations team as Dev/Ops Engineer (announcement).
- Erik Bernhardson joined the Features team as Features Engineer (announcement).
- Nischay Nahata joined the Features team as Features Contractor (announcement).
Technical Operations
Site infrastructure
- Several large wikis were migrated to MariaDB,
with positive results. A new class of redis servers were deployed in
support of the migration of our asynchronous job queuing infrastructure
from MySQL to redis, enabling us to better meet the demands of Wikidata
and Echo. New file uploads are now being written to Ceph in Eqiad, in
addition to Swift in pmtpa, in support of a potential migration. The
current plan is to open up the Eqiad Ceph cluster for 'reads' the second
week of May. Currently 'reads' are served by the Tampa Swift cluster.
- With the core cluster migrated to Eqiad, we are now working on the miscellaneous server cluster. As part of the cleanup, we retired servers as well.
Data Dumps
- New tools
for import of partial or full content into a new wiki have been
released. A step-by-step walkthrough of their use has been added to the documentation for users of the dumps on Meta.
- True incremental dumps are now a GSOC proposal and several students have applied for this project.
- The logging table XML dump on Wikidata was taking days to run, due
in part to the high volume of edits there, much more than even the
English Wikipedia. Most of those edits wind up being recorded as
autopatrol in the log, making it already about half the size of the
logging table for the English Wikipedia. Breaking up the database query
into smaller batches works around the issue.
Wikimedia Labs
- Work on tool labs is progressing nicely. 32 bots/tools have been
added to the tools project. Most of the functionality of Toolserver
should now be available in tool labs. Database replication is still
being worked on, but is progressing well. The pre-labs replication
databases are being replicated to, and the Redactatron application has
been finished, allowing us to mark tables as ok to replicate. Our
current roadmap is for database replication to be accessible by the time
of the Amsterdam Hackathon. Instance creation performance greatly
improved this month by replacing the generic Ubuntu cloud images with
our own custom images that pre-installs and pre-configures most of what
an initial puppet run would handle. Work on single-instance MediaWiki
continued this month, making the initial MediaWiki installation more
robust and handling a number of legal issues (such as showing the terms
of use, using proper logos, linking to a proper privacy policy, etc.).
Work began on adding Ajax interactivity to the OpenStackManager
interface. Currently changes are in for reboot and get console output
actions for managing instances. A more reasonable project filter change
using jQuery Chosen has been added as well. Work on replacing glusterfs
is mostly done. Two projects have been switched to use the new NFS
server and the rest will be switched next month. Work has begun on
upgrading OpenStack from the essex to the folsom release. Our testing
environment has been upgraded and production tests are currently
ongoing. During the OpenStack summit, work was done to push the Moniker
DNS application into OpenStack incubation to be added as a supported
OpenStack project. Ryan Lane gave a talk
during the OpenStack summit about the state of OpenStack's user
committee, along with Tim Bell of CERN and JC Martin of eBay. Work on
the user committee is in hopes of making OpenStack easier to use an
upgrade, which should increase the frequency of updates in Labs.
Editor retention: Editing tools
VisualEditor
In
April, the team continued their work on the major new features that
will be added in the coming months. Our objective is for VisualEditor to
be the default editor for all Wikipedia users, capable of letting them
edit the majority of content without needing to use the wikitext editor,
in July 2013. This means we have been focussed on four substantial
areas of work: adding support for references, templates, categories and
media items. During this time the main area of our work was editing
around images, which is now designed and partially implemented in our
experimental code, and around categories, which is almost complete and
nearly ready for deployment. The deployed alpha version of VisualEditor
was updated thrice (
1.22-wmf1,
1.22-wmf2 and
1.22-wmf3),
adding speed improvements, user interface improvements and work on the
back-end to better support the new features, and fixing a number of
bugs. We also were able to
deploy the VisualEditor to fourteen more Wikipedias as an opt-in alpha
(and, later, Vietnamese Wikipedia too), which has let the community
give us feedback on what works and is broken, and identifying language-
and locale-specific issues we are now fixing.
Parsoid
In
April, the Parsoid team successfully deployed the cumulative work done
over the last four months. This includes support for non-English wiki
configurations, a rewritten serialization subsystem based on server-side
DOM diffs, category link and basic template parameter editing support
and a long list of fixes and improvements.
Several other features for the July release are on track. The specification for extensions containing templates and templates containing extensions were fleshed out and are currently being implemented. Similarly, our specs for images and thumbnails were vastly improved so that we will soon support full editing for all parameters.
We also improved our code quality and testing infrastructure.
In preparation for the July release, we did more benchmarking and capacity planning. A
caching strategy that avoids overwhelming the API with requests was developed, hardware to run Parsoid was ordered and work on the implementation started.
Editor engagement features
Notifications
In
April, we deployed Notifications on the English Wikipedia and
mediawiki.org. This first release aims to inform users about new
activity that affects them on Wikipedia, such as talk page messages,
page reviews, mentions, edit reverts or thanks. Ryan Kaldari developed a
new feature that lets users mark all notifications as read, and updated
the fly-out and archive page, based on designs from Vibha Bamba. Benny
Situ completed the bundling feature and developed some of the first
metrics dashboards, in collaboration with Dario Taraborelli. Luke
Welling continued to develop HTML email notifications and a
notifications mailbox. Fabrice Florin managed the product development
and release of this notification system, and coordinated its
socialization on the English Wikipedia with Oliver Keyes. We're also
grateful to Steven Walling and Matt Flaschen from our E3 team for
developing the
Welcome and
Getting started notifications. To learn more, visit the
project portal, read the
help page and join the discussion on the
talk page.
Article feedback
This month, we deployed the final release version of Article Feedback v5 on the
English,
French and
German
Wikipedias. Developer Matthias Mullie updated the back-end software in
order to re-enable the tool on the English Wikipedia, and fixed a number
of bugs reported on the German Wikipedia. Fabrice Florin worked with
Pau Giner, Oliver Keyes and community members to simplify the feedback
page, as well as finalize feedback links, auto-archive and opt-in
features. Learn more in
this project update. To enable feedback on articles you watch on the English Wikipedia, simply add the '
Article Feedback 5' category to these pages. For more tips on how to use this version, visit the
testing page, and let us know what you think on the
Article Feedback Talk page.
We are now wrapping up development for this project, and will collect
community suggestions for the next few months to prepare for upcoming
votes on the French and German Wikipedias later this year.
Flow
Design
work continues and several discussions were had about what constitutes a
minimum viable product for the first iteration of Flow. Brandon Harris
is now
building an interactive prototype to help describe multiple functions.
Editor engagement experiments
Editor engagement experiments
In April, the Editor Engagement Experiments (E3) team focused first and foremost on its
account creation and login redesigns in MediaWiki core. The
first phase of the launch
invited editors and readers on all Wikimedia projects to test the new
forms on an opt-in basis, to identify bugs and localization issues
across our many wikis. We expect to release these as the default forms
in May, pending any final blockers.
For the team's Onboarding new Wikipedians project, we completed quantitative analysis
of the latest version of the GettingStarted landing page, and began
prototyping a new landing page and navigation system for usability
testing prior to further development and launch, which is expected in
early May as well.
On the analytics and infrastructure front, the team handed off the product roadmap for the User Metrics
API to the Analytics team and colleagues in the Grantmaking and
Programs department. Ori Livneh, in support of the data analysis needs
on the team, began work supporting a Foundation instance of IPython Notebook.
Last but not least, the E3 team held its second
Quarterly Review session, and began work planning its next high-level
goals for the April–June quarter.
Support
2012 Wikimedia fundraiser
Language tools
Milkshake
The
development team added a Divehi language web font to jQuery.webfont,
and several contribution patches to jQuery.ime were merged. Redesign
suggestions from the Product team on the Universal Language Selector
(ULS) were reviewed by interaction designer Pau Giner and accepted by
the development team. Changes include the launch workflow for ULS, as
well as changes to display settings and font settings workflows for
logged-in users. Development to reflect these changes is in progress and
expected to be completed and tested for deployment in May.
Language engineering communications and outreach
Highlights
of this month's communications and outreach activities by the team
include UX testing with community members for ULS changes by Pau Giner,
blog posts on team programs including the Language Mavens,
translatewiki.net home page, translation UX improvements. The team also
held office hours with the community as well as a successful bug triage
focused on translate bugs.
Commons App
The
Wikimedia Commons Android app is available in the Google Play store,
and we also added categorization support. Its iOS counterpart is
available in iTunes.
Wikipedia Zero
We
deployed Mobile Web's MobileFrontEnd-ZeroRatedMobileAccess decoupling
code to production. We also started the next point release to support
more object-friendly JSON-backed carrier preferences, updated carrier
preferences, fixed UI button rendering bug, and documented configuration
parameters. Last, we added content to wiki pages, and prepared for the
migration of non-embargoed content to public wikis.
Mobile Web Photo Upload
In
April, we experimented with a login/signup call to action for
logged-out users from our in-article upload feature. This resulted in a
huge spike in new user contributions; however, the quality of the
uploads was lower than anticipated, and the quantity of inappropriate
uploads was a burden on the Commons community. In light of this, we
disabled the login/signup call to action, allowing only existing
Wikimedians to see and use the upload feature. We are still on target to
reaching our fiscal year target of 1,000 unique uploaders a month and,
when gated to existing users, the quality of the uploads has vastly
improved: 3/4th of the files are retained on Commons, as compared to
less than 1/4 when brand-new users were uploading. To create a more
focused uploading workflow and let mobile uploaders discover more
articles to illustrate, we also created a Nearby view on the beta
site, showing users a list of articles near them and highlighting the
ones that need images. We expect to release this to the full mobile web
site next month.
MediaWiki Core
Auth systems
During
April, the team primarily focused on implementing SUL v2, which will
fix issues that users are having with new security features in recent
browser releases. SUL v2 is ready for testing and deployment is targeted
for early May. In addition, the team worked toward a final design
specification for OAuth and will begin working on that pending the
successful deployment of SUL v2.
Search
Code
has been instrumented (and will soon be deployed) to log more data to
allow root cause analysis of the spurious "Zero results" issue. Some log
analysis was also done. The Puppet configuration on beta was updated to
limit lucene-search-2 memory usage on Labs.
MediaWiki 1.21
The
1.21 deployment cycle to Wikimedia wikis is complete, and the MediaWiki
1.21 tarball is being prepared for release, with a target release date
of May 15. Mark Hershberger recently released MediaWiki 1.21rc4.
MediaWiki 1.22
Git conversion
We
deployed a first iteration of a Bugzilla integration plugin, which
provides notifications to Bugzilla when changes are made in Gerrit.
We’ve increased the memory allocated to Gerrit, as well as deployed a
couple of other stability fixes; both of these changes should provide
some minor performance and stability improvements to users. Finally,
we’ve deployed a new version of Gerrit that includes superior garbage
collection support. This drastically improved the compression of
repositories on-disk, which has resulted in a wide range of improvements
for all users for all operations, from cloning to pushing to commenting
on changes.
Multimedia
Wikidata deployment
After
a minor delay due to some job queue and infrastructure migration work,
Wikidata Phase II was deployed to all Wikipedia sites. This allows
editors to reference and display content from Wikidata inside infoboxes.
Lua scripting
Some
bugs were fixed and internationalization changes merged this month; no
major changes were made. The community continues to develop Lua-based
templates, such as the citation templates on the English Wikipedia.
Site performance and architecture
All
job queues were migrated to JobQueueRedis off of the main DB clusters.
Improvements were made to the category update queries to reduce lock
exceptions that users often encountered when deleting files. This works
via a new transaction callback hook added to the core database class,
which can be used to resolve similar problems.
Admin tools development
Security auditing and response
We released the MediaWiki 1.19.5 and 1.20.4 security releases on April 15th.
Quality assurance
Quality Assurance
We
collaborated with Weekend Testing Americas to investigate new Account
Creation UX features with the E3 team, and tested Echo deployments with
the E2 team. We are investigating an intermittent failure with
UploadWizard for Firefox, and a styling issue with ResourceLoader in IE.
Beta cluster
We
started to point automated tests currently targeting test2wiki to Beta
labs to shake out issues there and ultimately improve test coverage.
This will help us with earlier detection of bugs introduced into master
(such as
bug 47015).
Mark Bergsma and Antoine Musso refined the Varnish configuration for
MobileFrontend, and further refined the configuration of the search
functionality.
Continuous integration
In
April, the Jenkins/Zuul platform encountered several issues such as the
gating job running tests against the current version of the branch
instead of the to-be-merged change (
bug 46723).
Antoine Musso solved several performances issues by using tempfs and a
new SSD drive and upgrading Zuul to the latest upstream version.
Timo Tijhof overhauled the automatically generated MediaWiki documentation for Javascript and PHP with Doxygen 1.7. He also fixed the duplicate test runs that happened in specific cases (bug 43391).
Finally he set up QUnit tests for the VisualEditor extension; if this
proves successful, QUnit runs will be generalized to all extensions.
Mark Holmquist improved the Jenkins jobs that track Parsoid regressions tests.
Finally, we now have linters for several languages: PHP, Python, Ruby
and even Yaml. If your git repositories are missing a lint check, please
contact us or file in a bug against Wikimedia > Continuous
Integration.
Browser testing
We
created a number of new builds to point browser tests to the beta
cluster as well as test2wiki. We also normalized user strings for test
purposes on test2wiki and beta cluster wikis. We added new tests for the
Preferences/Appearance tab and SUL login, and a volunteer contributor
added a test for PDF manipulation.
Analytics
Analytics infrastructure
We've improved the functionality of
Limn,
our visualization tool, to allow users to create and edit charts via
the UI. We can also automatically deploy new instances of Limn, so it's
faster and easier to setup dashboards. In addition to current users, we
expect this to be very helpful for the
Program Evaluation team as they start to develop their own analytics.
We're also now importing 1:1000 traffic streams, enabling us to migrate reports from our legacy analytics platform, WikiStats, onto our big data cluster, Kraken. In the future, this will make it easier for us to publish data and visualize reports using our newer infrastructure.
We have implemented secure login to the User Metrics API via SSL.
We've also introduce a new metric called <code|pages_created,
allowing us to count the number of pages created by a specific editor.
We improved the accuracy of the udp2log monitoring and upgraded the
machines to Ubuntu Precise in order to make the system more robust.
Analytics Visualization, Reporting & Applications
We published our
monthly report card.
As part of Wikimedia's ongoing mobile initiative, we also helped
develop analytics that would support ongoing delivery and planning of
mobile functionality:
- We've started to analyze mobile site pageviews
by device class, in order to determine how we will invest in building
applications and sites that support various device formats.
- We've also started to perform session analysis of mobile site
visits, in order to help us understand user behavior when using the
mobile sites, which will inform decisions about ongoing development
efforts. At present, this data is only for internal consumption by the
Mobile team.
- A new overall mobile pageviews report
is now available, which has improved the accuracy of our reporting due
to changes in how the MobileFrontend extension requests a wiki article
(improving performance).
- More information about how we're calculating mobile pageviews is available in our documentation.
We also introduced
new dashboards for our Editor engagement team, that will help them monitor the usage of the new
Notifications system. Finally, we've added
pageview stats for the Hungarian and Ukranian Wikivoyages.
Bug management
Mentorship programs
Technical communications
Volunteer coordination and outreach
The Kiwix project is funded and executed by Wikimedia CH.
- In April, we released for the first time Kiwix for Android.
This version doesn't provide as many features as the desktop app, but
it works well with all ZIM files. Two Kiwix developers will attend
Wikimania and have started preparing for a a small hackathon, two presentations and a permanent booth.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- The team hit a big milestone with the deployment of the first iteration of phase 2 of Wikidata on all remaining Wikipedias (it had been enabled on 11 Wikipedias previously). Qualifiers
were also enabled on Wikidata, making it possible to add additional
information to certain data. Wikipedians are now able to make use of the
data available on Wikidata in articles, allowing the data to be
collaboratively collected, curated and used by all Wikipedias.
- The team also fixed a few issues to make it possible to use Wikidata
with Internet Explorer 8, and worked on the time datatype. Together
with bot owners, they massively improved the time it takes for Wikidata
changes to show up in the recent changes and watchlists on Wikipedia
sites. The code and architecture got an external professional review;
the reviewers were quite happy with the quality of the code base and
gave useful tips for improvements.
Future
- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.
--
Guillaume Paumier
Technical Communications Manager — Wikimedia Foundation
https://donate.wikimedia.org