On Mon, Sep 8, 2008 at 8:46 PM, Brion Vibber <brion(a)wikimedia.org> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Gregory Maxwell wrote:
With protocol relatives, native HTTP support
requires solving:
1) Wildcard SSL certificates
2) Dumb SSL front-ending proxy to do crypto
3) Either making the load balancer highly IP-sticky *or* setting up
software for distributing the SSL session cache (i.e.
http://distcache.sourceforge.net/).
Doesn't a new HTTPS connection have to create a new SSL session? I'd
think you'd only get away with using the same session when reusing the
connection on keepalive, in which case it should just be staying open.
Or is the world of SSL far more strange and wonderful than I've
imagined... ;)
"There are more things in heaven and earth, Horatio, Than are dreamt
of in your philosophy."
The whole SSL RSA/DH keying setup is embarrassingly computationally
expensive, so much so that there still is a market for dedicated
accelerator chips that do nothing else. (Though perhaps other
processor makers will add modular arithmetic units like the SUN T2,
and make them pointless sometime in the next few generations).
Because of this SSL supports session caches for hot-starting
connections to the same server. You don't have to preserve the cache,
but it will cause a lot of wasted CPU if you do not since clients love
to fetch images in parallel to hide TCP latency.
It seems like a lot of people have attacked that session sharing
problem: there are apache modules for using memcached too, ones based
on libspread, filesystem based ones (I know you're dying for an excuse
to roll out NFS on all the frontends). In any case, this is a well
known and understood task enough so that it's pretty much a solved
deal.
(Currently the SSL is done on a proxy in front of the
regular web
servers; this is an Apache 2.2 proxy, rather than Squid, but it could be
any SSL-enabled proxy.)
Apache might not be a bad choice, but there are other more targeted
options. Even something as simple as running stunnel on the existing
squid front ends would work (although, I don't know if there is
session cache distribution support for stunnel).
I guess I missed a relevant point in my bullets: Using protocol
relative URLs would allow the secure front end to use the same squid
infrastructure, leveraging the many many gigabytes of cache will be
important for providing comparable performance. Without protocol
relatives, images could use the existing cache infrastructure, but the
wikitext could not (because the wikitext would need to be parsed
differently)