On Fri, 18 Mar 2005 22:33:19 -0800, Brion Vibber <brion(a)pobox.com> wrote:
Just a reminder of work in progress and general
background, for those
who might be commenting without being aware of present work...
First, in MediaWiki 1.5 we've made a major schema change, intended to
reduce the number of changes to data rows that have to be made and to
slim down the amount of data that has to be pulled per-row when scanning
non-bulk-text metadata.
Specifically, the 'cur' and 'old' tables are being split into
'page',
'revision', and 'text'. Lists of pages won't be trudging through
large
page text fields, and operations like renames of heavily-edited pages
won't have to touch 15000 records. This will also give us the potential
to move the bulk text to a separate replicated object store to keep the
core metadata DBs relatively small and limber (and cacheable).
Talk to icee in #mediawiki if interested in the object store; he's
working on a prototype for us, to use for image uploads and potentially
bulk text storage.
Second, remember that each wiki's database is independent. It's very
likely that at some point we'll want to split out some of the larger
wikis to separate master servers; aside from localizing disk and cache
utilization, this could provide some fault isolation in that a failure
in one master would not affect the wikis running off the other master.
Third, we're expecting to have at least two new additional data centers
soon in Europe and the US. Initially these are probably going to be
squid proxies since that's easy for us to do (we have a small offsite
squid farm in France currently in addition to the squids in the main
cluster in Florida) but local web server boxen pulling from local slaved
databases at least for read-only requests is something we're likely to
see, to move more of the load off of the central location.
Finally, people constantly bring up the 'PostgreSQL cures cancer'
bugaboo. 1.4 has experimental PostgreSQL support, which I'd like to see
as a first-class supported configuration for the 1.5 release. This is
only going to happen, though, if people pitch in to help in testing, bug
fixing, and of course make some benchmarks and failure mode tests
compared to MySQL! If you ever want Wikimedia to consider switching, the
software needs to be available to make it work and it needs to be
demonstrated as a legitimate improvement with a feasible conversion.
Domas is the PostgreSQL partisan on our team and wrote the existing
PostgreSQL support. If you'd like to help you should probably track him
down; in #mediawiki you'll usually find him as 'dammit'.
Thank you very much for this update. I just have some questions.
Where will deleted articles be? Tables archive_page, archive_revision,
archive_text? Will it just be like the current version?
At what stage (beta, rc, etc) of 1.5 will you be using it on Wikimedia
sites? Will you wait for 1.4.0 first?
Is a Bugzilla search for the new features possible? Can you provide a
link or some instructions?
Will you bring up
http://test.wikipedia.org again soon?
Is the HEAD branch in CVS the one which will become REL1_5? (I don't
understand CVS very much)
Is that branch currently usable without major developer knowledge?
Might I find anything really cool if I tried to use it?