New subject: Some questions about Wikipedia (caching, DB-updates)

20 Apr 2006

Good evening,

I am currently starting with my diploma thesis at the Semantic Mediawiki
group at University Karlsruhe. I will investigate caching issues in this
context and therefore I started with looking at Mediawiki / Wikipedia
caching and database updates.

...
 From what I got from the documentation there are three
types of caches: 
1) Squid HTTP caches (invalidated via Multicast HTCP purging). These
caches rewrite HTTP-header for client caching if necessary.
If I got that right, the caching time depends on $wgSquidMaxage, and the
client cache is invalidated once the user logs in or the page is edited.
Now is the Wikipedia setting for $wgSquidMaxage equal to the default
value given in the documentation (18000)? And is this caching strategy
enforced only in sendCacheControl() or are there other functions I have
to look at?
2) The parser cache which temporarily keeps already parsed pages - this
cache is either in memory or in a special table(?).
3) Some memory cache named memcached that is able to cache database
queries and distribute writes across the database architecture.
This is similar to MySQL query caching?

Another issue are database updates (these occur only, if I edit and
change a page, right? (assuming parser cache is enabled)) - what
influence does caching have on these?
I have to admit that I have not dived deep enough into the code to
understand where and how these updates happen.

I know that is a bunch of question, yet I appreciate any help, be it
answers or links to documentation I have not looked at, yet, very much.

Thanks in advance,
Sebastian