Jim accidentally sent this just to me, I'm sending it back to the list:
On mer, 2002-04-10 at 18:27, Jimmy Wales wrote:
Brion L. VIBBER wrote:
My best guess is that the parsing and lookups on regular pages are currently the main load, not editing or exotic database queries -- is this right?
Not a clue. Initially, the database certainly was the main load, but I haven't heard any newer figures. Jimbo?
I'll reset the slow-query log and make a new version available after a few hours of data collection.
We used to cache rendered articles, but Jimbo disabled this feature some time ago, claiming he was unable to find a performance advantage. (See mailing list archives circa February 13.)
But, I'm willing to try it again.
Personally, I've always find that idea suspicious; caching is definitely faster on my test machine, and is going to be a particularly big help with, say, long pages full of HTML tables! But then, my test machine has a much much lower load to deal with than the real Wikipedia. :) Nonetheless, if cacheing really isn't helping, that's because it's not doing something right. It should be found, fixed, and reenabled.
I would say that I agree with that.
Here's a question for everyone.
Let's say we have some portion of the page pre-calculated and cached. Is it faster to keep that cached text *in the database*, or *on the hard drive*?
I'm very strongly biased towards thinking that keeping it on the hard drive is faster, and by a significant margin, but only because I've never tested it and because I know (from long experience at Bomis) that opening up a text file on disk and spitting it out can be *really* fast, if the machine has enough ram such that the filesystem can cache lots of popular files in memory.
But, everything I read about MySQL talks about how screamingly fast it allegedly is, so...
--Jimbo
On mer, 2002-04-10 at 18:27, Jimmy Wales wrote:
Here's a question for everyone.
Let's say we have some portion of the page pre-calculated and cached. Is it faster to keep that cached text *in the database*, or *on the hard drive*?
I'm very strongly biased towards thinking that keeping it on the hard drive is faster, and by a significant margin, but only because I've never tested it and because I know (from long experience at Bomis) that opening up a text file on disk and spitting it out can be *really* fast, if the machine has enough ram such that the filesystem can cache lots of popular files in memory.
That's a good question, and one which I haven't made any attempt to test. As it is, we'll already be digging into the database to check things like the user settings, page view count, last edited date, and language links and meta-tag keywords (these last two gleaned from the list of links during parsing, and thus left out altogether when using the existing cache code). So it's probably not significantly slower to grab the stored HTMLized article while we're there.
On the other hand, some of this stuff (except for the page view count and user settings) could also be stored in a cache file and plunked ready-made into the output along with the HTML. User settings perhaps could be stored in a session cookie, refreshed only when the user first visits/logs in/changes preferences, saving a little extra on database access as well.
Worth it? No idea. But, hey, it's a suggestion.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org