On Mon, Sep 8, 2008 at 8:46 PM, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Gregory Maxwell wrote:
With protocol relatives, native HTTP support requires solving:
- Wildcard SSL certificates
- Dumb SSL front-ending proxy to do crypto
- Either making the load balancer highly IP-sticky *or* setting up
software for distributing the SSL session cache (i.e. http://distcache.sourceforge.net/).
Doesn't a new HTTPS connection have to create a new SSL session? I'd think you'd only get away with using the same session when reusing the connection on keepalive, in which case it should just be staying open.
Or is the world of SSL far more strange and wonderful than I've imagined... ;)
"There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy."
The whole SSL RSA/DH keying setup is embarrassingly computationally expensive, so much so that there still is a market for dedicated accelerator chips that do nothing else. (Though perhaps other processor makers will add modular arithmetic units like the SUN T2, and make them pointless sometime in the next few generations).
Because of this SSL supports session caches for hot-starting connections to the same server. You don't have to preserve the cache, but it will cause a lot of wasted CPU if you do not since clients love to fetch images in parallel to hide TCP latency.
It seems like a lot of people have attacked that session sharing problem: there are apache modules for using memcached too, ones based on libspread, filesystem based ones (I know you're dying for an excuse to roll out NFS on all the frontends). In any case, this is a well known and understood task enough so that it's pretty much a solved deal.
(Currently the SSL is done on a proxy in front of the regular web servers; this is an Apache 2.2 proxy, rather than Squid, but it could be any SSL-enabled proxy.)
Apache might not be a bad choice, but there are other more targeted options. Even something as simple as running stunnel on the existing squid front ends would work (although, I don't know if there is session cache distribution support for stunnel).
I guess I missed a relevant point in my bullets: Using protocol relative URLs would allow the secure front end to use the same squid infrastructure, leveraging the many many gigabytes of cache will be important for providing comparable performance. Without protocol relatives, images could use the existing cache infrastructure, but the wikitext could not (because the wikitext would need to be parsed differently)