I took a quick peek at the sampled squid log and found
that CSS and JS
files together are eating a lot of bandwidth; together they make up
about 20% of what's served:
https://wikitech.leuksman.com/view/Squid_bandwidth_breakdown
(May be inaccurate due to coding mistakes in my counter or weird dupe
caching effects.)
Please forgive me if this is a dumb question, but if I check the headers returned for
two successive requests, like so:
======================================================
root@bling:/var/www/hosts/mediawiki/wiki# curl --silent --include --head
http://en.wikipedia.org/skins-1.5/monobook/main.css?55
HTTP/1.0 200 OK
Date: Wed, 07 Feb 2007 02:56:31 GMT
Server: Apache
Cache-Control: max-age=2592000
Expires: Fri, 09 Mar 2007 02:56:31 GMT
Last-Modified: Tue, 06 Feb 2007 20:04:40 GMT
ETag: "60874b-709d-45c8df58"
Accept-Ranges: bytes
Content-Length: 28829
Content-Type: text/css
Age: 2
X-Cache: HIT from
sq30.wikimedia.org
X-Cache-Lookup: HIT from sq30.wikimedia.org:80
Via: 1.0 sq30.wikimedia.org:80 (squid/2.6.STABLE9)
Connection: close
root@bling:/var/www/hosts/mediawiki/wiki# curl --silent --include --head
http://en.wikipedia.org/skins-1.5/monobook/main.css?55
HTTP/1.0 200 OK
Date: Wed, 07 Feb 2007 02:56:26 GMT
Server: Apache
Cache-Control: max-age=2592000
Expires: Fri, 09 Mar 2007 02:56:26 GMT
Last-Modified: Tue, 06 Feb 2007 20:04:40 GMT
ETag: "15c02de-709d-45c8df58"
Accept-Ranges: bytes
Content-Length: 28829
Content-Type: text/css
Age: 9
X-Cache: HIT from
sq20.wikimedia.org
X-Cache-Lookup: HIT from sq20.wikimedia.org:80
Via: 1.0 sq20.wikimedia.org:80 (squid/2.6.STABLE9)
Connection: close
root@bling:/var/www/hosts/mediawiki/wiki#
======================================================
... then I have two questions:
1) Does it matter that the ETag varies between successive requests? Reason I ask is that
the
http://www.web-caching.com/mnot_tutorial/how.html page says: "HTTP 1.1 introduced
a
new kind of validator called the ETag. ETags are unique identifiers that are generated
by the server and changed every time the object does. Because the server controls how the
ETag is generated, caches can be surer that if the ETag matches when they make a
If-None-Match request, the object really is the same."
I.e. if the ETag changes between requests, as it did in the above example, could that
make requesters think that the object has changed too, thus reducing caching?
2) Would it help using "Cache-Control: max-age=2592000, public" instead of
"Cache-Control: max-age=2592000" ? Public is defined as "marks the
response as cacheable,
even if it would normally be uncacheable. For instance, if your pages are authenticated,
the public directive makes them cacheable." I.e. I'm not sure if the Wikipedia
cookie is
being treated as authentication for the purposes of this definition, but if it is,
caching
the site-wide CSS or JS seems unlikely to hurt (since it really is "public") -
but
obviously caching the user-specific CSS or JS would be bad.
It should be possible to serve these files compressed
through Apache
with mod_gzip set up, which should squish them by probably 2/3.
Last time I looked mod_gzip seemed to be losing favour somewhat - the new "in"
compression method for apache2 seems to be mod_deflate
(
http://httpd.apache.org/docs/2.0/mod/mod_deflate.html ), partially because it's an
apache module, rather than a 3rd-party module; however it got a few percentage points
less efficient compression than mod_gzip, but the suggestion was that it caused less
CPU load to do its compression. It was about a year ago that I looked at this stuff
though, so the state of play may have changed since.
All the best,
Nick.