Timwi wrote:
Tim Starling wrote:
Timwi wrote:
Jens Frank wrote:
On Tue, Jun 15, 2004 at 09:19:20AM +0200,
Gerard.Meijssen wrote:
> LS, I have been trying for two days to change one record in
> nl:wiktionary. the article is [[Sjabloon:-noun-]]
When saving a template, all pages using the template are "touched"
so that they
will be updated the next time they are rendered. Having zillions of
pages using
that template, this takes a long time. Probably too long.
Did whoever came up with that think about the performance
implications? o.O
Of course I did. In MediaWiki 1.2 I deliberately avoided this kind of
cache purging, for performance reasons.
Jens Frank was talking about "touching". This sounded like a database
query to me. Now you are terming it "cache purging", which I'm not sure
what it is, but if it were referring to memcached, it would not require
a database query.
"Touch" is the name of a unix utility which changes the modification
time on a file, usually setting it to the present time. Hence "touching"
a file is equivalent to making some infinitesimal change to it.
In the cur table, there is a field called "cur_touched". This field is
used for cache invalidation. If links on the page change from broken to
unbroken or vice versa, cur_touched is set to the present. This field is
queried to determine whether the client cache or the parser cache is valid.
The squid cache does not check back with the database on each query,
instead it keeps its own records of which pages are valid and which are
invalid. So whenever a page changes, we need to not only update
cur_touched, but also to send a message to each squid server informing
it that the page has become invalid. This is called "purging".
I suppose I definitely need to know more about this.
Where is the
"parser cache" -- is that in memcached or elsewhere?
The parser cache is a cache of the content HTML of a page, stored in
memcached. That is, it omits the sidebar, header and footer. It is
stored under an index which indicates what user preferences were used to
render it. You can tell when a page is served from the parser cache by
looking for an HTML comment at the end of the content area, e.g.
<!-- Saved in parser cache with key
enwiki:pcache:idhash:13173-1!1!1!monobook!1!1!0!1!0!1!0 and timestamp
20040622004805 -->
13173 is the main page.
What does
"touching" a page mean -- does it involve a database query? Can cache
purging be done without a database query? If not, can the cache
mechanism be changed (perhaps to use memcached) so database queries can
be avoided when all you want to do is invalidate a cache entry?
You can't store anything in memcached unless you're prepared to have it
go missing. Memcached is appropriate for caching read operations, but
not for replacing write operations.
One alternative is to use a system where two cur_touched fields must be
checked to validate a page, instead of one. One for the page, one for
the template. This would only be efficient for widely included
templates. Templates included in only a few pages would have to be
handled the old way.
Another alternative would be to reduce service. When changing a
template, before attempting to invalidate all caches, it would check how
many pages are using the template. If it is greater than some critical
number, the invalidation could simply be skipped. After a few days, the
squid cache and the parser cache will expire, and users will see the
updated template.
Please don't reply by saying "yeah, do that!"
-- Tim Starling