Hi, Pine.
I, too, am interested in building our understanding of our TechOps
infrastructure. https://www.mediawiki.org/wiki/Presentations has some
explanations of some parts, as does http://wikitech.wikimedia.org/ . I
welcome more links to guides/overviews.
At the recent Zurich hackathon, other developers agreed that it would be
good to have a guide to Wikimedia's digital infrastructure, especially
how MediaWiki is used.
https://www.mediawiki.org/wiki/Overview_of_Wikimedia_infrastructure is
.... a homepage with approximately nothing on it right now except this
diagram of our server architecture:
https://commons.wikimedia.org/wiki/File:Wikimedia_Server_Architecture_%28sim...
You might find the Performance Guidelines illuminating
https://www.mediawiki.org/wiki/Performance_guidelines and you might also
like the recent tech talk about how we make Wikipedia fast, by Ori
Livneh and Aaron Schulz, recently - see
http://www.youtube.com/watch?v=0PqJuZ1_B6w (I don't know when the video
is going up on Commons).
--
Sumana Harihareswara
Senior Technical Writer
Wikimedia Foundation
On 05/30/2014 06:30 PM, ENWP Pine wrote:
>
> Ori, thanks for following up.
>
> I think I saw somewhere that there is a list of postmortems for tech ops disruptions
> that includes reports like this one. Do you know where the list is? I tried a web search
> and couldn't find a copy of this report outside of this email list.
>
> I personally find this report interesting and concise, and I am interested in
> understanding more about the tech ops infrastructure. Reports like this one
> are useful in building that understanding. If there's an overview of tech ops
> somewhere I'd be interested in reading that too. The information on English
> Wikipedia about WMF's server configuration appears to be outdated.
>
> Thanks,
>
> Pine
>
>
>> Date: Thu, 29 May 2014 22:38:10 -0700
>> From: Ori Livneh
ori@wikimedia.org
>> To: Wikimedia developers
wikitech-l@lists.wikimedia.org
>> Subject: Re: [Wikitech-l] 404 errors
>> Message-ID:
>>
CAHXK4ByYa8ae0EVGAUFWSCrjZtAQh+sjTW6ccJ14mB8o-teSoQ@mail.gmail.com
>> Content-Type: text/plain; charset=UTF-8
>>
>> On Thu, May 29, 2014 at 1:34 PM, ENWP Pine
deyntestiss@hotmail.com wrote:
>>
>>> Hi, I'm getting some 404 errors consistently when trying to load some
>>> English Wikipedia articles. Other pages load ok. Did something break?
>>>
>>
>> TL;DR: A package update went badly.
>>
>> Nitty-gritty postmortem:
>>
>> At 20:25 (all times UTC), change Ie5a860eb9[0] ("Remove
>> wikimedia-task-appserver from app servers") was merged. There were two
>> things wrong with it:
>>
>> 1) The appserver package was configured to delete the mwdeploy and apache
>> users upon removal. The apache user was not deleted because it was logged
>> in, but the mwdeploy user was. The mwdeploy account was declared in Puppet,
>> but there was a gap between the removal of the package and the next Puppet
>> run during which the account would not be present.
>>
>> 2) The package included the symlinks /etc/apache2/wmf and
>> /usr/local/apache/common, which were not Puppetized. These symlinks were
>> unlinked when the package was removed.
>>
>> Apache was configured to load configuration files from /etc/apache2/wmf,
>> and these include the files that declare the DocumentRoot and Directory
>> directives for our sites. As a result, users were served with 404s. At
>> 20:40 Faidon Liambotis re-installed wikimedia-task-appserver on all
>> Apaches. Since 404s are cached in Varnish, it took another five minutes for
>> the rate of 4xx responses to return to normal (20:45).[1]
>>
>> [0]:
https://gerrit.wikimedia.org/r/#/c/136151/
>> [1]:
>>
https://graphite.wikimedia.org/render/?title=HTTP%204xx%20responses%2C%20201...