Hi,
The report covering Wikimedia engineering activities in June 2013 is now available.
Wiki version:
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/June
Blog version:
https://blog.wikimedia.org/2013/07/12/engineering-june-2013-report/We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge:
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2013/June/summary
Below is the full HTML text of the report.
As always, feedback is appreciated on the usefulness of the report and its summary, and on how to improve them.
------------------------------------------------------------------
Major news in June include:
Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Personnel
Are you looking to work for Wikimedia? We have a lot of hiring coming
up, and we really love talking to active community members about these
roles.
Announcements
- Sean Pringle joined the Technical Operations team as our Storage and Database Engineer (announcement).
- Brian Wolff joined the Wikimedia Platform Engineering group as Software developer for the Summer, working on multimedia contribution and review (announcement).
- Ken Snider joined the Technical Operations team as an international
contractor, poised to fill the Director of Technical Operations position
(announcement).
- Toby Negrin joined the Engineering department as Director of Analytics (announcement).
Technical Operations
Site infrastructure
- As part of our capacity planning work, Mark Bergsma upgraded most of
our Varnish infrastructure (in EQIAD & ESAMS) with newer and faster
servers. He will be adding new mobile Varnish servers in ESAMS next,
this coming month. Rob Halsell and Daniel Zahn are pushing ahead with the migration of the other applications
from Tampa to EQIAD. New Parsoid application and Varnish servers were
also deployed in anticipation of the coming VisualEditor deployment.
Meantime, Alexandros Kosiaris is starting the backup project work; read
more about the project and the technology.
- Mark also put in the finishing touches to deploy all the new network
infrastructure at ESAMS. With help from Mark and Leslie Carr, we
finally got approval from ARIN for some new IPv4 addresses, needed for
our new ULSFO buildup.
- Many people are refactoring Puppet code with the ultimate goal of
having everything organized into Puppet modules. Andrew Bogott, Antoine
Musso and Alexandros are setting up an automated testing infrastructure
to support these efforts.
Data Dumps
- Our GSoC student, Petr Onderka, is set up in gerrit and committed
his first contributions to the Incremental Dumps project; you can follow his code, read his progress reports and check the current discussion
on the mailing list. Additionally, we hold IRC meetings on weekdays at
about 4:15 pm (UTC) in #wikimedia-tech; lurkers and contributors are
welcome.
Wikimedia Labs
- Wikimedia Labs saw a lot of improvements in June, including the
deployment of AJAX improvements for OpenStackManager to wikitech (added
actions: console output; improvements: reboot), and a new interface for
displaying quotas for projects in OpenStackManager. We ensured that all
instances were properly running Puppet and Salt; Many instances were
running puppetmaster::self and needed to have local puppet repo merges
or rebases. We upgraded Salt everywhere and re-issued keys to fix a
vulnerability in Salt. The team also worked on stabilizing the NFS
server. We've encountered a kernel bug with NFS; we have changed the
scheduler from cfq to deadline and have decreased the read and write
sizes of clients to 8k. Progress has been made towards making the Labs
database replicas available to the Labs at large (as opposed to only the
Tool Labs project). Last, much work has been done towards user request
fulfillment in Tool Labs, including work towards WSGI support.
Editor retention: Editing tools
VisualEditor
In
June, the VisualEditor team completed the major new features that we
prioritised over the past few months, in preparation for making
VisualEditor available to most Wikipedia users in July. We have built an
editor that is capable of letting users edit the majority of content
without needing to use wikitext — text support, as well as adding and
editing inclusions of references, templates, categories and media items.
The deployed alpha of VisualEditor was updated four times as part of
the transition to weekly deployments (
1.22-wmf6,
1.22-wmf7,
1.22-wmf8 and
1.22-wmf9),
with several mid-deployment releases as the code was developed to patch
urgent issues. Part of this involved running an A/B test for new user
accounts on the English Wikipedia, with half of the users getting opt-in
to VisualEditor ahead of the wider release. Generally, there were a
number of user interface improvements, and fixing a number of bugs
uncovered by the community.
Parsoid
Early
this month, we deployed Parsoid to the new cluster and started to track
all edits and template / image updates from all Wikipedia sites, which
is close to the full load we'll see when VE is deployed to all of them.
Our earlier optimization work paid off as the Parsoid cluster and the
associated Varnish caches are handling the load very well. The extra
load we put on the API cluster is low enough to not cause a problem. As
expected, the VisualEditor deployment to the English Wikipedia hardly
showed up in the load graphs.
Despite being very short-staffed this month (only two full-time
developers), the absence of performance issues left us enough time for a
lot more polishing before the VisualEditor release on July 1. As a
result, the release went very well with clean diffs on almost all pages.
While more work is left to do, it is now clear that we have
fundamentally achieved our goal of a clean translation between WikiText
and HTML + RDFa. This does not only enable visual HTML editing, but also
makes Wikipedia's content easily accessible in a standardized format.
It also opens up new opportunities for MediaWiki's core architecture,
which we'll pursue this fiscal year.
Editor engagement features
Notifications
In
June, we released more features and bug fixes for Notifications on the
English Wikipedia and
mediawiki.org. Ryan Kaldari added a confirmation
button for the '
Thanks feature', and updated
notification fly-outs to show diff links for talk page and interactive notifications, based on a design by Vibha Bamba. Benny Situ continued development of
HTML Email notifications
and deployed a variety of feature updates. Erik Bernhardson developed a
special 'Suppressed' content feature, while Matthias Mullie developed a
range of
new metrics dashboards. Dario Taraborelli and Aaron Halfaker ran a week-long
A/B test of new user activity;
results show that new users who received Echo notifications made more
edits than those who did not, but their edits were reverted slightly
more often. Fabrice Florin led the planning process for Notifications,
as outlined in the
2013 roadmap, and hosted a day-long
roundtable discussion to improve editor engagement features in collaboration with Wikipedia users (
see Echo demo and Q&A video on YouTube).
Later this summer, we plan to start deploying Notifications on more
wiki projects, starting with Meta and the French Wikipedia. To learn
more, visit the
project portal, read the
FAQ page and join the discussion on the
talk page.
Article feedback
In June, we deployed final features and bug fixes for the
Article Feedback Tool (AFT5) on the
English,
French and
German Wikipedias. Matthias Mullie released
an opt-in feature
to enable or disable feedback on a page, based on designs by Pau Giner
and specifications by Fabrice Florin. In collaboration with Dario
Taraborelli, Matthias also developed an updated set of
metrics dashboards
showing how the new moderation tools are being used: for example, about
half of moderated feedback is marked as 'no action needed', while about
a tenth is marked as 'useful' (these results are generally
consistent across different languages).
The team also supported a wider deployment of AFT5 on over 40,000
articles on the French Wikipedia, as well as a poll by the German
community, which elected not to adopt the tool. Now that feature
development has ended for this project, we plan to make AFT5 available
to other wiki projects in coming weeks, as outlined
in the release plan. For tips on how to use Article feedback, visit the
testing page, and let us know what you think on this
talk page.
Editor engagement experiments
Editor engagement experiment
In June, the Editor Engagement Experiments (E3) team continued work on its experiments related to
onboarding new Wikipedians, and launched several new extensions to Wikimedia projects.
First, the new Campaigns extension
was added to all wikis. This analytics tool helps identify internal or
external sources of new registrations, by adding a "campaign" name to
the signup page URL. This month, E3 began running campaigns to learn
about how many anonymous editors sign up on the top 10 Wikipedias, as
well as how many sign up via the invitation to "Join Wikipedia" on the
login page (see the list of active campaigns and analysis). Another piece of analytics infrastructure by the team is the new CoreEvents extension, which houses logging of MediaWiki core activity, like preference updates and page saves across all projects.
For the Getting Started project, the team conducted usability testing (see results and documentation) of new designs. E3 also refactored and refined the guided tours
extension in June, including adding usability enhancements like new
interface animations, support for community tours, and bug fixing. The
team also planned and began work on an experiment to deliver guided tours to all first-time editors.
The team also assisted with A/B testing and research for
VisualEditor
before its July 1 launch date, assisting with experimental design,
EventLogging instrumentation, and other work. After the VisualEditor
launch, E3 started a week-long
micro-survey of newly-registered users on English Wikipedia, to give us a first systematic look at the gender diversity of those creating accounts.
Support
2012 Wikimedia fundraiser
The
initial work on the Adyen payments gateway was finally completed and
deployed to production, though we have not yet used the gateway in a
campaign. Plans for a mobile fundraising campaign and workflow continued
to move forward: We expect to do the first mobile-targeted campaign in
mid to late July. Some last-minute tweaking was done to the payments
cluster in preparation for the resumption of continuous fundraising on
July 1, coinciding with the start of the fiscal year. Payments listener
(thulium) deploy was completed, db1013 was moved into the firewalled
fundraising cluster and rebuilt as a fundraising QA server, and work
continued on the new CiviCRM server (barium). Fundraising backups were
overhauled.
Wikipedia Zero
This
month, the team launched Wikipedia Zero with Dialog in Sri Lanka,
patched logic and user interface bugs, enhanced the configuration
editor, expanded logging and debugging for identification of anomalous
access, further decoupled ZeroRatedMobileAccess from MobileFrontend, and
proposed ESI- and JavaScript-based software re-architecture.
Mobile Web Photo Upload
This
month, we focused on improving education around uploads, including an
interactive Commons tutorial and first-time user copyright and scope
check. We also released our "Nearby" feature to production, allowing
users to find articles near them that are in need of images, take photos
and upload them via mobile.
Mobile Nav
In
beta, we started working on an update to our site and article
navigation, including design tweaks to the left navigation menu and a
new in-article contributory navigation that combines article actions
(edit, upload, and watch) with a talk page link. We also experimented
with Echo integration and successfully got Notifications up and running
on the English Wikipedia mobile site. We hope to push all of this work
to production next month.
MediaWiki Core
MediaWiki 1.22
Git conversion
Chad
Horohoe and Christian Aistleitner upgraded our Gerrit instance from a
pre-release version of 2.6 to a pre-release version of 2.7 on the last
week of June. They've additionally published a new version of the
Bugzilla/Gerrit integration plugin. Details about new functionality can
be found in the
Gerrit 2.7 draft release notes.
Multimedia
Admin tools development
Search
Work
has pretty much shifted from supporting MWSearch/lsearchd to
investigating and implementing Solr. Nik Everett and Chad Horohoe have
begun writing an
extension
to implement Solr searching for MediaWiki, and we've gotten a lot of
the initial basic functionality completed. Peter Youngmeister and Andrew
Bogott will be handling the operations tasks for the new setup. Initial
operations tasks will involve packaging Solr 4 and working with Chad to
puppetize the whole design. Additionally, we're going to do some
investigation into ElasticSearch, as it's been suggested as an
alternative to Solr.
Auth systems
In
June, the team worked with the Wikimedia Foundation's user experience
team to improve SUL2. The improvements were pushed to test wikis on July
1, and will be rolled out to other wikis in July. Implementation of
OAuth is well underway, and planned for roll-out in July as well.
HipHop deployment
Security auditing and response
The
team continued to respond to reported security issues, and gave
security-oriented tech talks on emerging DoS techniques and using
OWASP's ZAP tool for vulnerability scanning.
Quality assurance
Quality Assurance
This
month saw a QA focus on automated browser tests. Besides creating new
tests and new builds, and reporting issues identified by tests, we
conducted a training session in San Francisco to create automated tests
for the Wikilove feature. We continue to support all WMF software
development projects, with the VisualEditor being a particular focus in
June.
Beta cluster
Max
Semenik wrote a script to synchronize CSS from production on beta.
Steinsplitter and Antoine Musso fixed the AbuseFilter configuration to
have a global list of filters on the
labswiki.
Filters should be configured there and will be used by all the wikis.
The PHP fatal errors catched by the wmerrors extension are now sent to
the beta udp2log instance. That will largely improve our troubleshooting
process.
Continuous integration
Timo
Tijhof and Antoine Musso triaged continuous integration bugs. Antoine
has setup a Jenkins slave and migrated most jobs on it. It will be very
easy to add new servers.
Browser testing
This
month, the QA team added new browser tests for UniveralLanguageSelector
and for Mobile (contributed by the Language engineering and Mobile
engineering teams, respectively), as well as browser test contributions
from volunteers. We created new builds in Jenkins to run browser tests
against IE10. We created tests for VisualEditor, including some with our
intern with the Outreach Program for Women.
Analytics
Analytics infrastructure
We made significant progress with our preparations for replacing udp2log with Kafka in our logging infrastructure. The
C library librdkafka has now support for the 0.8 protocol, there is a first version of
varnishkafka ready that will replace varnishncsa, the Apache Kafka project released their first beta of Kafka 0.8, and we have a
Debianized and
Pupppetized version. We keep on adding new
metrics
and alerts to monitor all the different parts of the webrequest
dataflows into Kraken. We expect to keep making improvements in the
coming months, until we have a fully reliable data pipeline into Kraken.
We also continued our efforts of moving Kraken out of beta: we
puppetized
Zookeeper,
JMXtrans, and the
Hadoop client nodes for Hive, Pig and Sqoop.
We started reinstalling the Hadoop Datanode workers with a fully
puppetized Hadoop installation; so far, we have replaced 3 nodes, and
we'll replace the other seven in the coming weeks. Last, we enabled
Jenkins continuous integration for the Grantmaking & Evaluation
dashboards.
Analytics Visualization, Reporting & Applications
This month, we completed the
end-user documentation
of UserMetrics (v1). We rebranded UserMetrics as Wikimetrics, and we
will slowly start to use that as the new name when referring to
UserMetrics v2 or UserMetrics replatforming. We focused on laying out
the foundation of Wikimetrics: a new database design, a new job queue
design and lots of unit tests. In addition, we started working on
porting over some of the features of UserMetrics v1 (like the 'namespace
edits' metric and UI components), we added user roles (so users can
only see their own metrics) and authentication using OAuth. Last, we
fixed some minor issues in UserMetrics v1, among which handling of user
names with comma, single and double quotes.
Data Releases
We
delivered many following analyses in June, including one of Arabic
cohort using UMAPI v1. Erik Zachte provided an analysis of Commons
uploaders, and we provided the Wikipedia Zero team with a number of
datasets to help them in tracking adoption of the Wikipedia Zero project
across the globe. We supported the VisualEditor and Editor Engagement
teams with experimental design, data modeling and data analysis for two
controlled experiments: a test of the impact of
impact of notifications and a
first test
of the impact of Visual Editor on new contributors. The tests were
carried out in June and the reports are being updated with the results
of the analysis. We started using the EE-dashboard instance on Labs to
host dashboards related to editor engagement projects, that were
previously hosted on the Toolserver (see the
metrics and
features
dashboards for the English Wikipedia). Last, we worked with the
Features engineering team to expand MediaWiki's instrumentation and
collect data on
cluster-wide user preference changes and
edit-related events to support VisualEditor analysis.
Bug management
Mentorship programs
The 20
Google Summer of Code and the 1
Outreach Program for Women interns have completed the
bonding period
(with 3 exceptions, 2 of them justified) and they are now working on
their projects. One OPW accepted candidate declined her participation
due to a job offer. Monthly status updates are available for these
projects:
We also met with
SocialCoding4Good, who are relaunching their activities, and we refreshed the
Wikimedia page.
We expect this to become a regular channel for new technical
contributors working in corporations with social/training programs.
Technical communications
In June, work on this topic mostly focused on perennial activities like
Tech news and ongoing communications support to engineering staff, as
Guillaume Paumier
was lent to the VisualEditor deployment effort, working on
communications, documentation and liaising with the French Wikipedia.
Volunteer coordination and outreach
The
decision of focusing on fewer activities better executed and based on
demand seems to be working out, although it's too soon to confirm the
trend. Browser test automation is the number one priority to recruit new
contributors, and any help to succeed here is welcome. We created the
QA mailing list
as an umbrella to host people and discussions focusing on software
quality assurance in all its aspects. We have more than 40 subscribers
and an initial flow of activity. We had a successful first
Browser Test Automation Workshop,
with 40 participants in San Francisco and a few more online; we will
iterate on this model. We have also helped organizing a Tech Talk on
Attack vectors & MediaWiki and OWASP ZAP, and the
upcoming Solr-based Search. The project to get automated
community metrics based on
vizGrimoire and provided by
Bitergia has been approved, and a first prototype can be seen at
http://korma.wmflabs.org. The project starts effectively on July 1 and includes a one-year period of maintenance. We agreed with the
Analytics team that they will assume the responsibility of this area during this period.
The Kiwix project is funded and executed by Wikimedia CH.
- Development of a new MediaWiki HTML dumper in nodeJS has started. This tool exports Wikipedia articles in static files based on the Parsoid output. This solution looks really promising, and new JavaScript developers are welcome.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- June in Wikidata was all about the sister projects. The development team published proposals for how Wikidata can support Commons and Wiktionary.
Additionally, they worked on the ability of Wikidata to store language
links to Wikivoyage in addition to Wikipedia; as a result, Wikivoyage
will soon also be able to manage their language links via Wikidata.
Another important step was the deployment of the geocoordinate datatype.
This makes it possible, for example, to indicate the location of a
city. Geocoordinates that are already in Wikidata can be seen on this map (huge version, updated daily).
- In a blog entry, Denny Vrandečić explained his understanding of the relation of Wikidata and the truth.
- In other news, further development of Wikidata has been supported through a large donation by the search engine company Yandex.
Future
- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. Annual goals for the 2013–2014 fiscal year are currently being drafted.
--
This report was reviewed and proofread using VisualEditor.