Hi, Pine.
I, too, am interested in building our understanding of our TechOps
infrastructure.
https://www.mediawiki.org/wiki/Presentations has some
explanations of some parts, as does
http://wikitech.wikimedia.org/ . I
welcome more links to guides/overviews.
At the recent Zurich hackathon, other developers agreed that it would be
good to have a guide to Wikimedia's digital infrastructure, especially
how MediaWiki is used.
https://www.mediawiki.org/wiki/Overview_of_Wikimedia_infrastructure is
.... a homepage with approximately nothing on it right now except this
diagram of our server architecture:
https://commons.wikimedia.org/wiki/File:Wikimedia_Server_Architecture_%28si…
You might find the Performance Guidelines illuminating
https://www.mediawiki.org/wiki/Performance_guidelines and you might also
like the recent tech talk about how we make Wikipedia fast, by Ori
Livneh and Aaron Schulz, recently - see
http://www.youtube.com/watch?v=0PqJuZ1_B6w (I don't know when the video
is going up on Commons).
--
Sumana Harihareswara
Senior Technical Writer
Wikimedia Foundation
On 05/30/2014 06:30 PM, ENWP Pine wrote:
Ori, thanks for following up.
I think I saw somewhere that there is a list of postmortems for tech ops disruptions
that includes reports like this one. Do you know where the list is? I tried a web search
and couldn't find a copy of this report outside of this email list.
I personally find this report interesting and concise, and I am interested in
understanding more about the tech ops infrastructure. Reports like this one
are useful in building that understanding. If there's an overview of tech ops
somewhere I'd be interested in reading that too. The information on English
Wikipedia about WMF's server configuration appears to be outdated.
Thanks,
Pine
> Date: Thu, 29 May 2014 22:38:10 -0700
> From: Ori Livneh <ori(a)wikimedia.org>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] 404 errors
> Message-ID:
> <CAHXK4ByYa8ae0EVGAUFWSCrjZtAQh+sjTW6ccJ14mB8o-teSoQ(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Thu, May 29, 2014 at 1:34 PM, ENWP Pine <deyntestiss(a)hotmail.com> wrote:
>
>> Hi, I'm getting some 404 errors consistently when trying to load some
>> English Wikipedia articles. Other pages load ok. Did something break?
>>
>
> TL;DR: A package update went badly.
>
> Nitty-gritty postmortem:
>
> At 20:25 (all times UTC), change Ie5a860eb9[0] ("Remove
> wikimedia-task-appserver from app servers") was merged. There were two
> things wrong with it:
>
> 1) The appserver package was configured to delete the mwdeploy and apache
> users upon removal. The apache user was not deleted because it was logged
> in, but the mwdeploy user was. The mwdeploy account was declared in Puppet,
> but there was a gap between the removal of the package and the next Puppet
> run during which the account would not be present.
>
> 2) The package included the symlinks /etc/apache2/wmf and
> /usr/local/apache/common, which were not Puppetized. These symlinks were
> unlinked when the package was removed.
>
> Apache was configured to load configuration files from /etc/apache2/wmf,
> and these include the files that declare the DocumentRoot and Directory
> directives for our sites. As a result, users were served with 404s. At
> 20:40 Faidon Liambotis re-installed wikimedia-task-appserver on all
> Apaches. Since 404s are cached in Varnish, it took another five minutes for
> the rate of 4xx responses to return to normal (20:45).[1]
>
> [0]:
https://gerrit.wikimedia.org/r/#/c/136151/
> [1]:
>
https://graphite.wikimedia.org/render/?title=HTTP%204xx%20responses%2C%2020…