I've snuck some quick statistics calls into Article.php to count the relative frequency of cache hits in page views.
In a few minutes on en:
** en ** Page views: 5420 Cache hits: 3322 (61%) Cache misses: 98 (1%) Client-side: 522 (9%) Uncacheable: 1478 (27%)
Not half bad. Cache hits here are where it's able to pull a complete pre-rendered page from the file cache and send it out. Cache misses are where it can do caching, but has to (re)render the page this time. Client-side caching is where the wiki is able to send a '304 Not modified' response and the client uses a copy it's previously cached locally. Uncacheable hits are old revision views, diffs, redirects, and views by logged-in users that aren't client-side cached.
These figures won't include anything but page views (special pages, editing, logins, etc).
I'll let it run and see if the ratios hold up during the day and with the various languages.
-- brion vibber (brion @ pobox.com)
On Fri, 12 Dec 2003, Brion Vibber wrote:
Uncacheable: 1478 (27%)
locally. Uncacheable hits are old revision views, diffs, redirects, and views by logged-in users that aren't client-side cached.
Since this seems to be a significant fraction of the load, is there any reason why old revision views and diffs aren't cached? They are not going to change more or less forever, unless rolled back or something.
Ciao, Alfio
On Dec 13, 2003, at 03:54, Alfio Puglisi wrote:
Since this seems to be a significant fraction of the load, is there any reason why old revision views and diffs aren't cached? They are not going to change more or less forever, unless rolled back or something.
They're not a significant fraction at all; a quick grep of logs indicates about 2% of page views are diffs or old revisions. They are allowed to be cached client-side, though that only catches a small fraction of the small fraction.
Caching old revisions and diffs doesn't make a lot of sense on the server either, since they're not likely to be revisited often. They'd have to be explicitly expired.
Also I suspect the majority of old and diff views are by logged-in wikipediholics, which our current server-side cache doesn't handle.
Stats after most of a day: ** all wikis ** Page views: 863755 Cache hits: 537668 (62%) Cache misses: 18113 (2%) Client-side: 100648 (12%) Uncacheable: 207326 (24%)
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
views by logged-in users that aren't client-side cached.
I don't understand why you don't cache those?
I assume you're currently file-caching complete rendered not-logged-in pages, right? Well, I don't understand why you have to do that, because you're artificially limiting yourself to caching only non-logged-in page views that way.
Admittedly, to change that now is a bit of work.
Greetings, Timwi
Timwi-
I assume you're currently file-caching complete rendered not-logged-in pages, right? Well, I don't understand why you have to do that, because you're artificially limiting yourself to caching only non-logged-in page views that way.
There's a lot of dynamic stuff depending on user prefs, from numbered headings to inline javascript to TOCs etc. You would effectively have to move much of the parsing to a post-rendering stage, which would be a major PITA to code and complicate the already ugly parser beyond recognition. About the same could be gained with much less work by caching DB-intensive operations, such as link checking, using memcached. There's no reason why we shouldn't have an index of all page titles in memory.
Regards,
Erik
Erik Moeller wrote:
Timwi-
I assume you're currently file-caching complete rendered not-logged-in pages, right? Well, I don't understand why you have to do that, because you're artificially limiting yourself to caching only non-logged-in page views that way.
There's a lot of dynamic stuff depending on user prefs, from numbered headings to inline javascript to TOCs etc. You would effectively have to move much of the parsing to a post-rendering stage, which would be a major PITA to code and complicate the already ugly parser beyond recognition. About the same could be gained with much less work by caching DB-intensive operations, such as link checking, using memcached. There's no reason why we shouldn't have an index of all page titles in memory.
Regards,
Erik
Perhaps we could append a hash of the user preferences to the cache filename. It would use lots of hard drive space, but it's simple to implement and would produce little CPU load due since there's no need for reprocessing. The idea would be to reduce page view time for people with default, or very slightly altered, preferences.
-- Tim Starling.
wikitech-l@lists.wikimedia.org