Hello,
There was a severe bug with the page www.wikipedia.org for about an hour today. Starting at about 6pm UTC and ending at 7:20 UTC, the text on that page was invisible. This was due to an improperly cached javascript file (which pointed to a 404). The URL of the file was purged and the page was fixed, but some steps will be taken to prevent this in the future.
Full report is available at https://wikitech.wikimedia.org/wiki/Incident_documentation/20170222-www-port...
but I'll paste it below anyway:
======== Summary ------------- At about 5pm UTC Feb. 22 the www.wikipedia.org page was severely broken for about an hour.
The text on the page was invisible. This bug was caused by a javascript file being improperly cached and returning a 404.
Timeline ----------- - A bug was filed at around 5:09pm UTC Feb.22 noting that the text on www.wikipedia.org is invisible. T158782
- We were made aware of this bug at about 5:40pm UTC
- at 6:15pm UTC an attempt was made to rollback to the previous deploy. The deploy was visible on mwdebug1002 without error, but the error persisted in production.
- at 6:20pm UTC we purged the URL of the specific javascript file, fixing the issue.
Conclusions ----------------- - The wikipedia.org portal depends on a specific order of syncing followed by purging urls, which is fragile and needs some rethinking.
- Errors in javascipt should not make the page unusable.
Actionables --------------- Adding an entire list of asset URLs to purge (Task T158810) Preventing javascript from hiding page content indefinitetly (Task T158809) Use query params for cache-busting (Task T158808) ========
Jan Drewniak UX Engineer, Discovery Wikimedia Foundation
wikitech-l@lists.wikimedia.org