[Engineering] Update from the Perf Team

Ori Livneh ori at wikimedia.org
Fri May 27 22:27:17 UTC 2016


Hello,

Here's what the performance team has been up to.

== Dashboards & instrumentation ==
We spent time instrumenting software and curating displays of performance
data. We have several new dashboards to share with you:

* Global edit rate and save failures (new)
  https://grafana.wikimedia.org/dashboard/db/edit-count

* Performance metrics (revamped)
  https://grafana-admin.wikimedia.org/dashboard/db/performance-metrics

* Page load performance
  https://grafana.wikimedia.org/dashboard/db/navigation-timing

  ...by continent:
https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-continent
  ...by country  :
https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-geolocation
  ...by browser  :
https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-browser

* We found that certain browsers were reporting wildly inaccurate timing
data and skewing our summary performance metrics, and reacted by validating
browser metric data more strictly against Navigation Timing API specs.


== ResourceLoader ==
ResourceLoader is the MediaWiki subsystem responsible for loading CSS,
JavaScript, and i18n interface messages for dynamic site features. It is
critical to site performance. Changes to ResourceLoader are focused on
reducing backend response time, ensuring we make efficient use of the
browser cache, and reducing time to first paint (the time it takes any
content to appear). This work is led by Timo Tijhof.

* The "/static/$mwBranch" entry point has been deprecated and removed in
favor of wmfstatic - a new multiversion-powered entrypoint accessed via
"/w" (via RewriteRule)
  https://phabricator.wikimedia.org/T99096

* Restricting addModuleStyles() to style-only modules (ongoing)
  https://phabricator.wikimedia.org/T92459

* Startup module check is now based on a feature test instead of browser
blacklist
  https://phabricator.wikimedia.org/T102318


== WebPageTest ==
Page load performance varies by browser, platform, and network. To
anticipate how code changes will impact page performance for readers and
editors, we use WebPageTest (https://wikitech.wikimedia.org/wiki/WebPageTest),
a web performance browser automation tool. WebPageTest loads pages on
Wikimedia wikis using real browsers and collects timing metrics. This work
is led by Peter Hedenskog.

* We now generate waterfall charts for page loads on Firefox. Previously we
were only able to produce them with Chrome.

* We tracked downs two bugs in WebPageTest that caused it to report an
incorrect value for time-to-first-byte and reported them upstream.
  https://phabricator.wikimedia.org/T130182
  https://phabricator.wikimedia.org/T129735

* We upgraded the WebPageTest agent instance after observing variability in
measurements when the agent is under load.
  https://phabricator.wikimedia.org/T135985

* We designed a new dashboard to help us spot performance regressions
  https://grafana.wikimedia.org/dashboard/db/webpagetest


== Databases ==
The major effort in backend performance has been to reduce replication lag.
Replication lag occurs when a slave database is not able to reflect changes
on the master database quickly enough and falls behind. Aaron Schulz set
out to bring peak replication lag down from ten seconds to below five, by
identifying problematic query patterns and rewriting them to be more
efficient. We are very close to hitting that target: replication lag is
almost entirely below five seconds on all clusters.

https://phabricator.wikimedia.org/T95501

* High lag on databases used to generate special pages no longer stops job
queue processing
  https://phabricator.wikimedia.org/T135809

== Multi-DC ==
"Multi-DC" refers to ongoing work to make it possible to serve reads from a
secondary data center. Having MediaWiki running and serving requests in
more than one data center will reduce latency and improve site reliability.
This project is led by Aaron Schulz.

In order for this to be possible, we need to be able to anticipate which
requests will need the master database, so we can route them accordingly.
The plan is to achieve this by making sure that GET requests never require
a master database connection. We've made progress incremental progress
here, most recently by changing action=rollback to use JavaScript to
perform HTTP POST requests.

We also need to be able to broadcast cache purges across data centers. The
major work on this front has been the addition to core of EventBus classes
that relay cache proxy and object cache purges. Stas Malyshev of the
discovery team is assisting with this work.

== Thumbor ==
"Thumbor" is shorthand for the project to factor thumbnail rendering out of
MediaWiki and into a standalone service based on Thumbor (
http://thumbor.org/). This project is led by Gilles Dubuc. The following
list summarizes recent progress:

- Simplified the VCL as much as possible
- Added client throttling with the tbf vmod
- Added progressive JPEG support to ImageMagick engine
- Added configurable chroma subsampling support
- Made SVG detection more robust
- Added multilanguage SVG support
- Reproduced temp folder security mechanism found in MediaWiki for SVG for
all file types
- Swift's rewrite.py ported to vagrant. On Vagrant thumbor now hooks itself
into the same point in the stack it will in production
- Swift storage implemented (shard support left to do)
- Matched Content-Disposition behavior to MediaWiki
- Vastly increased performance on JPEG processing by using a long-running
exiftool process and named pipes to pass commands to it
- Made one instance of thumbor run on each available core on vagrant, since
thumbor is single-threaded
- Debian packaging well under way: https://phabricator.wikimedia.org/T134485
all dependencies covered except one. 14 backports and 17 new packages so
far. Working with Filippo to get as many of these into Debian proper as
possible.

Until next time,

Aaron, Gilles, Ori, Timo, and Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20160527/0c5e04ef/attachment.html>


More information about the Engineering mailing list