Good evening,
I am currently starting with my diploma thesis at the Semantic Mediawiki group at University Karlsruhe. I will investigate caching issues in this context and therefore I started with looking at Mediawiki / Wikipedia caching and database updates.
From what I got from the documentation there are three types of caches:
1) Squid HTTP caches (invalidated via Multicast HTCP purging). These caches rewrite HTTP-header for client caching if necessary. If I got that right, the caching time depends on $wgSquidMaxage, and the client cache is invalidated once the user logs in or the page is edited. Now is the Wikipedia setting for $wgSquidMaxage equal to the default value given in the documentation (18000)? And is this caching strategy enforced only in sendCacheControl() or are there other functions I have to look at? 2) The parser cache which temporarily keeps already parsed pages - this cache is either in memory or in a special table(?). 3) Some memory cache named memcached that is able to cache database queries and distribute writes across the database architecture. This is similar to MySQL query caching?
Another issue are database updates (these occur only, if I edit and change a page, right? (assuming parser cache is enabled)) - what influence does caching have on these? I have to admit that I have not dived deep enough into the code to understand where and how these updates happen.
I know that is a bunch of question, yet I appreciate any help, be it answers or links to documentation I have not looked at, yet, very much.
Thanks in advance, Sebastian
Sebastian Doeweling wrote:
- Some memory cache named memcached that is able to cache database
queries and distribute writes across the database architecture. This is similar to MySQL query caching?
No, it's not. MySQL query caching is useless, as a table's query cache is invalidated on every write. But memcached (http://www.danga.com/memcached/) has nothing directly to do with databases; it's just a fault-tolerant distributed in-memory blob cache. You can stuff arbitrary key-value pairs in it, so one popular use of it turns out to be database record caching.
In addition to stock memcached, Wikimedia sites use Tugela cache, which is Domas' unholy hybrid of BDB and memcached (essentially replacing the memory backend in memcached with disk-backing via BDB). Tugela adheres to the memcached API, though it requires manual item expiration management.
Another issue are database updates (these occur only, if I edit and change a page, right?
Yes, generally.
(assuming parser cache is enabled)) - what influence does caching have on these?
A page edit invalidates that page's cache.
"Sebastian Doeweling" doeweling@zkm.de wrote in message news:4446ECC7.7090100@zkm.de
Good evening,
[..]
Another issue are database updates (these occur only, if I edit and change a page, right? (assuming parser cache is enabled)) - what
Hallo Sebastian
The generated html (and thus the cached html) is also influenced by template inclusion mechanism
http://meta.wikimedia.org/wiki/Help:Template
A wikipage can include (template inclusion) another wikipage by this. Thus there are wikipages that, when changed, invalidate the cache of the wikipages that include them. This is transitive.
Just to make sure you don't forget that.
BTW the ultimate authority on template inclusion is Tim Starling. I'm only a wikipedian dealing a lot with templates.
Adrian
wikitech-l@lists.wikimedia.org