On Fri, 18 Mar 2005 22:33:19 -0800, Brion Vibber brion@pobox.com wrote:
Just a reminder of work in progress and general background, for those who might be commenting without being aware of present work...
First, in MediaWiki 1.5 we've made a major schema change, intended to reduce the number of changes to data rows that have to be made and to slim down the amount of data that has to be pulled per-row when scanning non-bulk-text metadata.
Specifically, the 'cur' and 'old' tables are being split into 'page', 'revision', and 'text'. Lists of pages won't be trudging through large page text fields, and operations like renames of heavily-edited pages won't have to touch 15000 records. This will also give us the potential to move the bulk text to a separate replicated object store to keep the core metadata DBs relatively small and limber (and cacheable).
Talk to icee in #mediawiki if interested in the object store; he's working on a prototype for us, to use for image uploads and potentially bulk text storage.
Second, remember that each wiki's database is independent. It's very likely that at some point we'll want to split out some of the larger wikis to separate master servers; aside from localizing disk and cache utilization, this could provide some fault isolation in that a failure in one master would not affect the wikis running off the other master.
Third, we're expecting to have at least two new additional data centers soon in Europe and the US. Initially these are probably going to be squid proxies since that's easy for us to do (we have a small offsite squid farm in France currently in addition to the squids in the main cluster in Florida) but local web server boxen pulling from local slaved databases at least for read-only requests is something we're likely to see, to move more of the load off of the central location.
Finally, people constantly bring up the 'PostgreSQL cures cancer' bugaboo. 1.4 has experimental PostgreSQL support, which I'd like to see as a first-class supported configuration for the 1.5 release. This is only going to happen, though, if people pitch in to help in testing, bug fixing, and of course make some benchmarks and failure mode tests compared to MySQL! If you ever want Wikimedia to consider switching, the software needs to be available to make it work and it needs to be demonstrated as a legitimate improvement with a feasible conversion.
Domas is the PostgreSQL partisan on our team and wrote the existing PostgreSQL support. If you'd like to help you should probably track him down; in #mediawiki you'll usually find him as 'dammit'.
Thank you very much for this update. I just have some questions.
Where will deleted articles be? Tables archive_page, archive_revision, archive_text? Will it just be like the current version?
At what stage (beta, rc, etc) of 1.5 will you be using it on Wikimedia sites? Will you wait for 1.4.0 first?
Is a Bugzilla search for the new features possible? Can you provide a link or some instructions?
Will you bring up http://test.wikipedia.org again soon?
Is the HEAD branch in CVS the one which will become REL1_5? (I don't understand CVS very much)
Is that branch currently usable without major developer knowledge? Might I find anything really cool if I tried to use it?