I've finally gone ahead and hacked up that preliminary page caching I've been talking about; see new changes to Article.php & co. As an emergency measure I've put it up on larousse/www.wikipedia.org with only minimal testing. So far it's working great -- system load is way down, response time seems good.
Presently it operates only on regular page views by users who are not logged in. I've tweaked the header in the corner so it no longer shows the IP address, so every anon's page will appear the same. (If someone's added to their talk page, this is detected and the cache is disabled, so the 'You have new messages' link will show and take them to the talk page.
For pages that are determined cacheable, we check a cache directory for a file: if it exists and is not obsoleted by the 'last touched' timestamp already established for dealing with browser caching, we just load it up and pass it straight through. If there's no file or it's obsolete, we install an output buffer handler, and at the end we catch the whole page output and save it to the file.
Caveats:
- Invalidation of cached pages is controlled by the same mechanism that invalidates browser caches, and will be subject to some of the same bugs there. Problem areas may include undeletion, the talk/article page links, and anywhere where the link tables are broken. Some redirects may be funny, but hopefully not. :)
- I'm pretty sure I excluded all the non-cacheable page view variants. I might have missed something, in which case bad pages could crop into the cache space.
- There's a site-wide cache invalidation date settable in localsettings. I haven't actually tested it :) and there should probably be a sysopable or developerable clear-all-caches special page. This also needs to be worked in to affect the browser cache as well.
- It should also be possible to explicitly clear the cache of a page and force it to regenerate in case it's screwed up. Perhaps a little button or something.
- Some pages, like the main page, should be invalidated periodically or else never cached, because they contain special variables (time, article count) which may change.
- This only affects non-logged-in users so far. But that makes up the greater part of our traffic, so that's okay for now. It makes the server faster for the rest of us. :)
- The cache directory is divided up like the upload directory is; so there should be 4096 separate dirs. Should be plenty for keeping ext3 from going mad and killing us all for a while yet.
Other notes: - Hypothetically we could fall back on cached pages if unable to contact the database.
- Cache files are not deleted on invalidation; they're just assumed obsolete, and replaced when needed.
- There's a fun new bug where logouts (or perhaps timeouts) leave a session in a funny state where the interface works as not-logged-in, but edits are saved with the formerly used user name (but still with 0 as the user id, so contribs doesn't work).
See our now much happier servers:
[brion@larousse w]$ uptime 15:26:42 up 14:57, 3 users, load average: 7.36, 9.50, 7.76 [brion@larousse w]$ free total used free shared buffers cached Mem: 1030952 996824 34128 0 176848 554108 -/+ buffers/cache: 265868 765084 Swap: 1020088 72416 947672 [brion@larousse w]$ vmstat 1 15 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 72416 34104 176848 554248 0 10 8 88 393 210 73 3 24 0 0 0 72428 34020 176848 554300 0 172 0 172 761 394 60 6 34 0 0 0 72428 34636 176848 554312 0 0 0 616 380 187 30 1 69 0 0 0 72428 34644 176848 554356 0 156 0 156 496 338 43 4 53 0 0 0 72416 34480 176852 554416 0 0 28 0 512 308 64 2 34 1 0 0 72420 34472 176852 554456 0 200 0 200 479 274 48 2 50 2 0 2 72420 34472 176852 554464 0 0 0 208 527 341 82 6 12 3 0 0 72392 34712 176852 554512 0 204 0 536 552 420 55 4 41 1 0 0 72392 34720 176852 554548 0 0 0 0 436 287 42 0 58 1 0 0 72372 34732 176852 554564 0 208 0 208 536 315 32 8 60 2 0 0 72368 34568 176852 554612 0 0 20 0 521 320 53 3 45 1 0 0 72380 33980 176852 554620 0 224 0 840 491 318 29 6 65 1 0 0 72384 33716 176856 554688 0 108 12 108 487 302 55 1 44 2 0 0 72384 33856 176856 554712 0 0 0 0 410 237 38 0 62 0 0 0 72384 33836 176856 554724 0 192 0 192 299 163 14 2 84
[brion@pliny brion]$ uptime 3:28pm up 4 days, 5:28, 1 user, load average: 2.46, 3.49, 3.13 [brion@pliny brion]$ free total used free shared buffers cached Mem: 2068912 1973376 95536 0 35360 1155584 -/+ buffers/cache: 782432 1286480 Swap: 2047992 436568 1611424 [brion@pliny brion]$ vmstat 1 15 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 436568 95020 35392 1155900 12 8 41 7 18 43 47 5 47 3 2 0 436568 90744 35444 1156460 4 0 572 728 477 730 36 5 59 2 4 1 436568 89072 35508 1157508 0 0 1084 1012 754 866 38 7 55 4 0 0 436568 88900 35516 1157696 0 0 180 60 515 656 37 3 60 5 0 0 436568 84020 35540 1157808 0 0 104 444 477 815 51 6 43 2 0 0 436568 80556 35544 1157816 0 0 0 204 401 596 47 3 50 1 0 0 436568 79796 35556 1157860 0 0 56 124 394 378 19 2 78 2 0 0 436568 79756 35560 1157900 0 0 32 180 513 655 23 7 70 0 1 0 436568 79740 35604 1157924 0 0 24 1165 377 357 34 10 56 1 0 0 436568 76008 35612 1157996 0 0 72 16 383 366 7 2 90 3 0 0 436568 67984 35616 1158148 0 0 152 0 380 391 14 3 83 0 1 0 436568 64404 35632 1158560 0 0 416 132 401 504 26 4 70 2 1 0 436568 67392 35692 1159556 0 0 992 856 538 639 13 6 82 1 5 3 436568 66912 35704 1159808 0 0 240 1782 573 737 59 6 35 0 1 1 436568 66880 35732 1159928 0 0 116 2194 707 732 43 9 48
Pliny's got room to expand, and Larousse's end still has optimization that can be done. So things are looking good!
-- brion vibber (brion @ pobox.com)
On Fri, 2003-05-16 at 09:01, Brion Vibber wrote:
[brion@larousse w]$ uptime 15:26:42 up 14:57, 3 users, load average: 7.36, 9.50, 7.76
Compare after a few hours:
[brion@larousse w]$ uptime 20:17:31 up 19:48, 1 user, load average: 1.76, 1.94, 2.00
:)))))))))
I've cleared the cache (via the now-working $wgCacheEpoch setting) to a) test it and b) clear out cached pages that were missing the external images off setting; this pops load back up a bit:
[brion@larousse includes]$ uptime 20:50:58 up 20:22, 2 users, load average: 8.04, 6.99, 4.56 [brion@larousse includes]$ uptime 20:52:57 up 20:24, 2 users, load average: 4.72, 6.15, 4.55
but still nice.
Also most major TeX should be rerendered by now, so we should have fewer spikes on that account.
-- brion vibber (brion @ pobox.com)
On Fri, 2003-05-16 at 09:01, Brion Vibber wrote:
- The cache directory is divided up like the upload directory is; so
there should be 4096 separate dirs. Should be plenty for keeping ext3 from going mad and killing us all for a while yet.
Er, my math sucks after no sleep, apparently. That's *256* dirs, not 4096. Each presently has ~90-150 files (1.2-2 megabytes) stored in it. Total file cache stacks up to 30,468 pages (about 450 megs) right now.
A very silly thought I had: we could also gzip the cache files, and send the pre-gzipped versions to browsers that support it to decrease bandwidth usage. Maybe I'm going overboard here though. :)
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org