Optimizations (Was Re: [Wikitech-l] Content-encoding: gzip)

21 Jun 2003


      --On Donnerstag, 12. Juni 2003 23:02 Uhr -0700 Brion Vibber 
brion@pobox.com wrote:
...

We could offer either as default or as an option to compress

dynamically generated pages as well, which could shave some more
percentage points off the bandwidth usage. Might be a help for the modem
folks who do log in. :)  However I'm not sure how much this would affect
CPU usage; in any case there's no urgency for this, it's just something
we might do if we have the cycles to burn (I don't think we do just now,
but we might one day).
We do dynamic gzipping of pages on a rather large website (~3.000.000 
dynamic hits daily). The experience we gathered so far showed us, that the 
gzipping itself is actually rather fast, compared to the page generation 
process through PHP/Perl. The main problem with dynamic gzipping is, that 
you have to build up the whole page in memory, instead of sending out lines 
as they are generated (don't know, how the Wikipedia software currently 
works). As a safeguard, we occasionally (about once a minute per Apache 
process) read /proc/loadavg on Linux systems. If it's higher than a 
specified limit (9.0 on our systems) we temporarily disable page gzipping.
Some other optimization-related suggestions (I'm not familiar with what was 
already suggested, sorry):
- Drop Apache for the image delivery. Instead, put a webserver like thttpd 
(http://www.acme.com/software/thttpd/) on a subdomain for image delivery. 
The amount of delivered hits compared to used memory and CPU time is 
significantly better than with Apache in our experiences.
- Consider implementing Squid as a front-end to your dynamic Apache. It's 
fairly fast to implement if your software delivers proper headers. This 
implements caching for anon-users without extra code. Even for logged in 
users it has a serious advantage: Apache need not wait anymore till it can 
send out all data. Usually, by doing that, the load on the servers will 
increase, as it sits less idly, waiting for the traffic to be sent out, but 
at the same time, more pages are delivered per second. Dynamic Apaches are 
usually also very sensitive to bad connections as you usually only have a 
small limited pool of processes (we run a maximum of 70 on our site for 
example) - if the connections of users are generally bad, they can easily 
hog 99% of your apache processes in the starting connection state and 
thereby basically bring Wikipedia down, even though 95% of your server 
ressources are not actually used. A Squid frontend usually delivers 
increased performance even if you completely disable caching in Squid. A 
drawback though is the increased number of context switches/second and more 
memory-copy operations.
-- 
Markus Peter - SPiN AG
warp@spin.de

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Optimizations (Was Re: [Wikitech-l] Content-encoding: gzip)