On Fri, 20 Jan 2012 05:24:44 -0800, Thomas Dalton thomas.dalton@gmail.com wrote:
On 20 January 2012 01:06, Ryan Lane rlane32@gmail.com wrote:
No, there isn't a difference. A blackout where everyone sees a page with a particular message instead of the article they wanted is exactly the same as unscheduled downtime where everyone sees a page with a particular message instead of the article they wanted. If search engines and caches can survive one of them, they can survive both, since they are identical from an external perspective.
I'm sorry. but this is silly. I have a hard time believing that you aren't simply trolling here.
How is it silly? I'm not trolling, I just think the way the blackout was implemented looked really unprofessional and I can't see any good reason for not having done a better job. All we wanted was for anyone viewing any page on the site to see a particular static page rather than what they would usually see. That isn't difficult to do, as evidenced by the fact that it happens automatically whenever the site breaks.
It is in fact difficult to do. The message that comes up when the site is down has nothing to do with what would be necessary to have the cluster serve out a sopa page.
The cluster is NOT designed to serve out something 'instead' of what it usually serves. The cluster is designed to serve Wikipedia's MediaWiki installation, period.
Error pages are served by the apaches, not the squids/varnishes. And we can't rely on that for the serving of a sopa page. An error page necessitates one of two interactions with the cache. Either the cache stores the contents of the error page and keeps serving that. Which obviously is NOT what one wants in the normal case, since the squid/varnish cache will be serving out an error page still after an issue has gone away. So naturally you'd expect that serving an error necessarily means that the cache is kept empty. But that's NOT what we want with a sopa page. If that happens then either we're still serving cached entries of Wikipedia articles when we're supposed to be serving a sopa page. Or EVERY request ends up bypassing the cache and hitting the apaches to get the uncached sopa page. Which is NOT an acceptable implementation of a sopa page because that kind of traffic bypassing the cache will kill the apaches and cripple the cluster. It would be like DDoSing Wikipedia's SOPA page.
So this means that a real sopa page would likely involve modifications of the caching configuration. Probably also something that involves purging the ENTIRE front end cache, both before and after the sopa setup. And naturally deployment of something that will serve the sopa page throughout the cluster. Potentially outside of the actual MediaWiki installation despite the fact that the cluster was only designed to handle the MediaWiki installation. And of course ops also needs to make sure that the cluster can even handle the traffic when all the cached entries disappear and piles of requests need to be made to the apaches to repopulate the cache. Then there is the issue of testing the whole thing before deployment.
So yes, the concept that a sopa page and error pages when the apaches can't handle traffic are identical is silly, very silly.