I've done some work at converting the Wikipedia to Postgres, but am not there yet. So, let's put that aside for now.
It seems that the wiki "source" is "interpreted" into html every single time someone accesses a link. That seems like a lot of overhead.
Given that for every time a change is made to the wiki source to a page, several people "view" it, why not just regenerate the html only when changes are made, and store it? It would take more storage space, but should be MUCH faster. And if storage is an issue, I can donate some hard drives...
The savings on the Recent Changes page alone should work wonders.
Jonathan
(moving to the wikitech-l list; see sign-up and archive page at http://www.wikipedia.org/mailman/listinfo/wikitech-l )
Jonathan Walther wrote:
I've done some work at converting the Wikipedia to Postgres, but am not there yet. So, let's put that aside for now.
Great! I did get postgresql installed on my machine, but got bogged down in details of converting the table definitions and various interface behaviors. Someone with prior experience working with postgres would be a big help there.
It seems that the wiki "source" is "interpreted" into html every single time someone accesses a link. That seems like a lot of overhead. Given that for every time a change is made to the wiki source to a page, several people "view" it, why not just regenerate the html only when changes are made, and store it? It would take more storage space, but should be MUCH faster. And if storage is an issue, I can donate some hard drives...
We used to cache in the phase II days on the old server. This was removed for two reasons: 1) Wiki->HTML rendering is still pretty darn fast, particularly with our new dedicated server; database contention seems to be our main problem during high-load periods. 2) We had problems keeping the cache consistent with the old code.
On number 2, I would certainly welcome an improved cache subsystem that's designed right from the ground up. The old one was hacked in as a "crap! the system's unusably slow, let's hack in some improved code"
On number 1, note that LinkCache::addLink() does a brief query on the cur table for every link when rendering a wikipage. These could probably be consolidated somehow or other. (Note that this does not apply to Recentchanges, which loads everything in a big chunk.)
The savings on the Recent Changes page alone should work wonders.
On the English wikipedia, Recentchanges is loaded at default options about 3000 times per day; the number of edits per day is a similar figure, and every edit means the page has to change to reflect it. Caching the rendered display wouldn't seem to save significantly over rerendering it on each view.
-- brion vibber (brion @ pobox.com)
wikipedia-l@lists.wikimedia.org