On Jan 3, 2004, at 13:46, Gabriel Wicke wrote:
What kind of caching is done at the moment? And what are the current timeouts?
Every page view comes in through wiki.phtml as the entry point. This runs some setup code, defines functions/classes etc, connects to the database, normalizes the page name that's been given, and checks if a login session is active, loading user data if so.
Then the database is queried to see if the page exists and whether it's a redirect, and to get the last-touched timestamp.
If the client sent an If-Modified-Since header, we compare the given time against the last-touched timestamp (which is updated for cases where link rendering would change as well as direct edits). If it hasn't changed, we return a '304 Not Modified' code. This covers about 10% of page views.
If it's not a redirect, we're not looking at an old revision, diff, or "printable view", and we're not logged in, the file cache kicks in. This covers some 60% of page views. If saved HTML output is found for this page, it's date is checked. If it's still valid, the file is dumped out and the script exits. The cache file is a complete gzipped HTML page; if the browser doesn't advertise understanding gzip, we decompress it on the fly. (Note that this may affect benchmarks in comparison to actual browsers in use, I don't know.)
If the cached page doesn't exist or is out of date, page rendering continues as normally, and the output is compressed and saved at the end. About 2% of page views involve saving a new cached page.
There's no timeout; pages are invalidated immediately by updating their last-touched timestamps. A global cache epoch can be set on the server to invalidate all old cached pages (server- or client-side), and individual user accounts also have a cache epoch which is reset on login, when user options are changed, and when talk page notification comes on/off.
If this is a redirect, old page view, diff, or printable view, or if the user is logged in, then we don't do any server-side caching (yet) and parse/render the whole page. Some speedups have been accomplished by precaching link lookup info in easily-loadable chunks. E23's been working on storage of the HTML-rendered wiki pages to be inserted into the overall layout, but this needs some more finalization (various user options may affect the rendering of the page).
Ideally we'd be putting cached data into memcached, which can run in-memory on the web server (or as a distributed cache over a web server cluster) without grinding down the disks. So far we use memcached just for some common data (localized messages, utf8 translation tables, interwiki prefix lookup) and login sessions.
-- brion vibber (brion @ pobox.com)