George Herbert wrote:
I'm confused; what exactly is in the cache for
logged in users?
The HTML I see coming out the end has encoded username variable and username
in scripts and so forth.
I always assumed that logged in users generated a new HTML render every time
they hit a page, and that we assumed that logged in users were a small
enough fraction of total page views that it wasn't a big deal. But I'm not
familiar with the Wikipedia cacheing at all in detail; I've run a lot of
Mediawiki servers but never bothered to cache anything (none of them had
enough hits to make it worthwhile trying).
So... what's the underlying mechanism?
Thanks...
There's a lot of caches. That's how wikipedia survives having so much
visitors ;)
When you make a petition to wikimedia wikis, it will go to a squid
cluster: pmtpa, knams or yaseo.
A cache fail at knams or yaseo passes the petition to pmtpa squids.
A cache fail at pmtpa passes the petition to the apaches.
The squids will only serve the page if they have served the same url for
the same user before, where the same user means:
-You, if you're logged in (time since last time your cookie changed).
-Another anonymous user, if you're not logged in.
Note how mediawiki has a flag at LocalSettings, to show your ip on the
top as it does with logged users. By disabling it, all anonymous get the
same html and it can be cached on the squids.
When a page is modified, the squids are told not to remember that page
any more.
When the petition arrives to the apaches, it will be served, php drops
in, and assembles a bunch of stuff, in which adding your username is no
problem.
It uses:
-Your username, permissions, preferences... which are stored in your
session.
-The skin at a php file.
-The content of Mediawiki: messages, which have its own cache.
-The article content is cached.
-The diffs are cached too.
If you keep those groups untouched, there's no problem adding more
dynamic things. If some of them is not in the cache (stored at
memcached), it will need to be generated, for which you'll need to ask
the db.
Of all of them, parsing is the most expensive and breaking its caching
will send you to the hell of getting "Wikipedia has a problem" errors on
each edit.
Have i missed something?