Right... So nothing is cached in squid between logged-in users, each of
those gets a new parse and render of a page they go to in each session. If
I hit
http://en.wikipedia.org/wiki/FooBar twice in one session and it hasn't
been updated, I get the Squid cache of it, but otherwise nothing. And if
User:FredJoeBob hits it, it generates a new render of the page, which is
cached for him but not for me.
However, the parsed article is cached until someone changes something in the
underlying article. And that's shared.
So the specific concern is that putting some additional level of parsing
between the memcached article text and the final output HTML will slow
things down? And that's where it would have to go for the {{USERNAME}} to
expand out properly.
I am not a Javascript guy, so I apologize in advance if this is a dumb
question, but... Is it possible to make {{USERNAME}} some javascript which
expands it on the client side, so the server just provides that JS to the
browser and lets you figure it out? That would be the same JS code for
everyone, so the underlying parsed article would stay in memcached
unchanged...
I don't know if JS can carry the equivalent of a global variable within the
page, so I'm not sure if you could set such in the already per-user
generated header stuff and then expand it with JS in the fixed page content
part. I guess that's what I'm thinking of here. But I freely admit that I
don't know if that's possible or not.
-george william herbert
On Nov 14, 2007 1:35 PM, Platonides <Platonides(a)gmail.com> wrote:
George Herbert wrote:
I'm confused; what exactly is in the cache
for logged in users?
The HTML I see coming out the end has encoded username variable and
username
in scripts and so forth.
I always assumed that logged in users generated a new HTML render every
time
they hit a page, and that we assumed that logged
in users were a small
enough fraction of total page views that it wasn't a big deal. But I'm
not
familiar with the Wikipedia cacheing at all in
detail; I've run a lot of
Mediawiki servers but never bothered to cache anything (none of them had
enough hits to make it worthwhile trying).
So... what's the underlying mechanism?
Thanks...
There's a lot of caches. That's how wikipedia survives having so much
visitors ;)
When you make a petition to wikimedia wikis, it will go to a squid
cluster: pmtpa, knams or yaseo.
A cache fail at knams or yaseo passes the petition to pmtpa squids.
A cache fail at pmtpa passes the petition to the apaches.
The squids will only serve the page if they have served the same url for
the same user before, where the same user means:
-You, if you're logged in (time since last time your cookie changed).
-Another anonymous user, if you're not logged in.
Note how mediawiki has a flag at LocalSettings, to show your ip on the
top as it does with logged users. By disabling it, all anonymous get the
same html and it can be cached on the squids.
When a page is modified, the squids are told not to remember that page
any more.
When the petition arrives to the apaches, it will be served, php drops
in, and assembles a bunch of stuff, in which adding your username is no
problem.
It uses:
-Your username, permissions, preferences... which are stored in your
session.
-The skin at a php file.
-The content of Mediawiki: messages, which have its own cache.
-The article content is cached.
-The diffs are cached too.
If you keep those groups untouched, there's no problem adding more
dynamic things. If some of them is not in the cache (stored at
memcached), it will need to be generated, for which you'll need to ask
the db.
Of all of them, parsing is the most expensive and breaking its caching
will send you to the hell of getting "Wikipedia has a problem" errors on
each edit.
Have i missed something?
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
-george william herbert
george.herbert(a)gmail.com