Just for kicks, I've added some preliminary, experimental support for gzip encoding of pages that have been saved in the file cache. If $wgUseGzip is not enabled in LocalSettings, it shouldn't have any effect; if it is, it'll make compressed copies of cached files and then serve them if the client claims to accept gzip.
At present this only affects file-cachable pages: so plain current page views by not-logged-in users. Compression is only done when generating the cached file, so it oughtn't to drain CPU resources too much. My informal testing shows the gzipping takes about 2-3 ms, which is much shorter than most of the page generation steps. (Though it will eat up some additional disk space, as both uncompressed and compressed copies are kept on disk.)
I'd appreciate some testing with various user agents to see if things are working. If you receive a compressed page, there'll be a comment at the end of the page like <!-- Cached/compressed [timestamp] -->
A few notes:
This needs zlib support compiled into PHP to work. I've done this on Larousse.
An on-the-fly compression filter could also be turned on for dynamic pages and logged-in users, but I haven't done this yet. Compression support could be a user-selectable option, so those with problem browsers could turn it off, or those with slow modems could turn it on where off by default. :)
The purpose of all this is of course to save bandwidth; there are two ends of this, the server and the client:
Jimbo has pooh-poohed concerns about our bandwidth usage; certainly the server has a nice fat pipe to the internet and isn't in danger of choking, and whatever Bomis's overall bandwidth usage, Jimbo hasn't complained that we're crowding out his legitimate business. :) But still, we're looking at 5-20 *gigabytes* *per day*. A fair chunk of that is probably images and archive dumps, but a lot is text.
On the client end: schmucks with dial-up may appreciate a little compression. :)
I've also fixed what seems to be a conflict between the page cache and client-side caching.
There are some race conditions remaining as far as making sure that two loads of the same page don't overwrite each other's work or read another's page partway through, and adding a gzipped second file perhaps complicates this a bit... also still some cases where caches aren't invalidated properly.
-- brion vibber (brion @ pobox.com)
On Tue, May 20, 2003 at 05:03:03AM -0700, Brion Vibber wrote:
Just for kicks, I've added some preliminary, experimental support for gzip encoding of pages that have been saved in the file cache. If $wgUseGzip is not enabled in LocalSettings, it shouldn't have any effect; if it is, it'll make compressed copies of cached files and then serve them if the client claims to accept gzip.
At present this only affects file-cachable pages: so plain current page views by not-logged-in users. Compression is only done when generating the cached file, so it oughtn't to drain CPU resources too much. My informal testing shows the gzipping takes about 2-3 ms, which is much shorter than most of the page generation steps. (Though it will eat up some additional disk space, as both uncompressed and compressed copies are kept on disk.)
One of the nice things about gzip is that decompression is much much cheaper than compression. It might almost make sense to just compress everything and then decompress on the fly if you need it.
wikitech-l@lists.wikimedia.org