Hello,
There was a severe bug with the page
www.wikipedia.org for about an hour
today. Starting at about 6pm UTC and ending at 7:20 UTC, the text on that
page was invisible. This was due to an improperly cached javascript file
(which pointed to a 404). The URL of the file was purged and the page was
fixed, but some steps will be taken to prevent this in the future.
Full report is available at
https://wikitech.wikimedia.org/wiki/Incident_documentation/20170222-www-por…
but I'll paste it below anyway:
========
Summary
-------------
At about 5pm UTC Feb. 22 the
www.wikipedia.org page was severely broken for
about an hour.
The text on the page was invisible. This bug was caused by a javascript
file being improperly cached and returning a 404.
Timeline
-----------
- A bug was filed at around 5:09pm UTC Feb.22 noting that the text on
www.wikipedia.org is invisible. T158782
- We were made aware of this bug at about 5:40pm UTC
- at 6:15pm UTC an attempt was made to rollback to the previous deploy. The
deploy was visible on mwdebug1002 without error, but the error persisted in
production.
- at 6:20pm UTC we purged the URL of the specific javascript file, fixing
the issue.
Conclusions
-----------------
- The
wikipedia.org portal depends on a specific order of syncing followed
by purging urls, which is fragile and needs some rethinking.
- Errors in javascipt should not make the page unusable.
Actionables
---------------
Adding an entire list of asset URLs to purge (Task T158810)
Preventing javascript from hiding page content indefinitetly (Task T158809)
Use query params for cache-busting (Task T158808)
========
Jan Drewniak
UX Engineer, Discovery
Wikimedia Foundation