On Fri, 20 Jan 2012 05:24:44 -0800, Thomas Dalton
<thomas.dalton(a)gmail.com> wrote:
On 20 January 2012 01:06, Ryan Lane
<rlane32(a)gmail.com> wrote:
No, there
isn't a difference. A blackout where everyone sees a page
with a particular message instead of the article they wanted is
exactly the same as unscheduled downtime where everyone sees a page
with a particular message instead of the article they wanted. If
search engines and caches can survive one of them, they can survive
both, since they are identical from an external perspective.
I'm sorry. but this is silly. I have a hard time believing that you
aren't simply trolling here.
How is it silly? I'm not trolling, I just think the way the blackout
was implemented looked really unprofessional and I can't see any good
reason for not having done a better job. All we wanted was for anyone
viewing any page on the site to see a particular static page rather
than what they would usually see. That isn't difficult to do, as
evidenced by the fact that it happens automatically whenever the site
breaks.
It is in fact difficult to do. The message that comes up when the site is
down has nothing to do with what would be necessary to have the cluster
serve out a sopa page.
The cluster is NOT designed to serve out something 'instead' of what it
usually serves. The cluster is designed to serve Wikipedia's MediaWiki
installation, period.
Error pages are served by the apaches, not the squids/varnishes. And we
can't rely on that for the serving of a sopa page. An error page
necessitates one of two interactions with the cache. Either the cache
stores the contents of the error page and keeps serving that. Which
obviously is NOT what one wants in the normal case, since the
squid/varnish cache will be serving out an error page still after an issue
has gone away. So naturally you'd expect that serving an error necessarily
means that the cache is kept empty. But that's NOT what we want with a
sopa page. If that happens then either we're still serving cached entries
of Wikipedia articles when we're supposed to be serving a sopa page. Or
EVERY request ends up bypassing the cache and hitting the apaches to get
the uncached sopa page. Which is NOT an acceptable implementation of a
sopa page because that kind of traffic bypassing the cache will kill the
apaches and cripple the cluster. It would be like DDoSing Wikipedia's SOPA
page.
So this means that a real sopa page would likely involve modifications of
the caching configuration. Probably also something that involves purging
the ENTIRE front end cache, both before and after the sopa setup. And
naturally deployment of something that will serve the sopa page throughout
the cluster. Potentially outside of the actual MediaWiki installation
despite the fact that the cluster was only designed to handle the
MediaWiki installation.
And of course ops also needs to make sure that the cluster can even handle
the traffic when all the cached entries disappear and piles of requests
need to be made to the apaches to repopulate the cache.
Then there is the issue of testing the whole thing before deployment.
So yes, the concept that a sopa page and error pages when the apaches
can't handle traffic are identical is silly, very silly.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]