Right... So nothing is cached in squid between logged-in users, each of those gets a new parse and render of a page they go to in each session. If I hit http://en.wikipedia.org/wiki/FooBar twice in one session and it hasn't been updated, I get the Squid cache of it, but otherwise nothing. And if User:FredJoeBob hits it, it generates a new render of the page, which is cached for him but not for me.
However, the parsed article is cached until someone changes something in the underlying article. And that's shared.
So the specific concern is that putting some additional level of parsing between the memcached article text and the final output HTML will slow things down? And that's where it would have to go for the {{USERNAME}} to expand out properly.
I am not a Javascript guy, so I apologize in advance if this is a dumb question, but... Is it possible to make {{USERNAME}} some javascript which expands it on the client side, so the server just provides that JS to the browser and lets you figure it out? That would be the same JS code for everyone, so the underlying parsed article would stay in memcached unchanged...
I don't know if JS can carry the equivalent of a global variable within the page, so I'm not sure if you could set such in the already per-user generated header stuff and then expand it with JS in the fixed page content part. I guess that's what I'm thinking of here. But I freely admit that I don't know if that's possible or not.
-george william herbert
On Nov 14, 2007 1:35 PM, Platonides Platonides@gmail.com wrote:
George Herbert wrote:
I'm confused; what exactly is in the cache for logged in users?
The HTML I see coming out the end has encoded username variable and
username
in scripts and so forth.
I always assumed that logged in users generated a new HTML render every
time
they hit a page, and that we assumed that logged in users were a small enough fraction of total page views that it wasn't a big deal. But I'm
not
familiar with the Wikipedia cacheing at all in detail; I've run a lot of Mediawiki servers but never bothered to cache anything (none of them had enough hits to make it worthwhile trying).
So... what's the underlying mechanism?
Thanks...
There's a lot of caches. That's how wikipedia survives having so much visitors ;)
When you make a petition to wikimedia wikis, it will go to a squid cluster: pmtpa, knams or yaseo. A cache fail at knams or yaseo passes the petition to pmtpa squids. A cache fail at pmtpa passes the petition to the apaches.
The squids will only serve the page if they have served the same url for the same user before, where the same user means: -You, if you're logged in (time since last time your cookie changed). -Another anonymous user, if you're not logged in. Note how mediawiki has a flag at LocalSettings, to show your ip on the top as it does with logged users. By disabling it, all anonymous get the same html and it can be cached on the squids. When a page is modified, the squids are told not to remember that page any more.
When the petition arrives to the apaches, it will be served, php drops in, and assembles a bunch of stuff, in which adding your username is no problem. It uses: -Your username, permissions, preferences... which are stored in your session. -The skin at a php file. -The content of Mediawiki: messages, which have its own cache. -The article content is cached. -The diffs are cached too.
If you keep those groups untouched, there's no problem adding more dynamic things. If some of them is not in the cache (stored at memcached), it will need to be generated, for which you'll need to ask the db. Of all of them, parsing is the most expensive and breaking its caching will send you to the hell of getting "Wikipedia has a problem" errors on each edit.
Have i missed something?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l