George Herbert wrote:
I'm confused; what exactly is in the cache for logged in users?
The HTML I see coming out the end has encoded username variable and username in scripts and so forth.
I always assumed that logged in users generated a new HTML render every time they hit a page, and that we assumed that logged in users were a small enough fraction of total page views that it wasn't a big deal. But I'm not familiar with the Wikipedia cacheing at all in detail; I've run a lot of Mediawiki servers but never bothered to cache anything (none of them had enough hits to make it worthwhile trying).
So... what's the underlying mechanism?
Thanks...
There's a lot of caches. That's how wikipedia survives having so much visitors ;)
When you make a petition to wikimedia wikis, it will go to a squid cluster: pmtpa, knams or yaseo. A cache fail at knams or yaseo passes the petition to pmtpa squids. A cache fail at pmtpa passes the petition to the apaches.
The squids will only serve the page if they have served the same url for the same user before, where the same user means: -You, if you're logged in (time since last time your cookie changed). -Another anonymous user, if you're not logged in. Note how mediawiki has a flag at LocalSettings, to show your ip on the top as it does with logged users. By disabling it, all anonymous get the same html and it can be cached on the squids. When a page is modified, the squids are told not to remember that page any more.
When the petition arrives to the apaches, it will be served, php drops in, and assembles a bunch of stuff, in which adding your username is no problem. It uses: -Your username, permissions, preferences... which are stored in your session. -The skin at a php file. -The content of Mediawiki: messages, which have its own cache. -The article content is cached. -The diffs are cached too.
If you keep those groups untouched, there's no problem adding more dynamic things. If some of them is not in the cache (stored at memcached), it will need to be generated, for which you'll need to ask the db. Of all of them, parsing is the most expensive and breaking its caching will send you to the hell of getting "Wikipedia has a problem" errors on each edit.
Have i missed something?