[Mediawiki-l] Multi-server distributed MediaWiki

Kelly Jones kelly.terry.jones at gmail.com
Sun Jan 7 04:56:53 UTC 2007


On 1/4/07, Sy Ali <sy1234 at gmail.com> wrote:
> On 1/2/07, Kelly Jones <kelly.terry.jones at gmail.com> wrote:
> > The mirror servers should be as independent as possible:
> > geographically diverse, on different backbones, each using their own
> > MySQL server, owned by different people, etc. Some of the mirror
> > servers may be free, others may be paid hosting, others may be
> > dedicated servers, etc.
>
> You're looking for the holy grail of hosting.

Here's my grail-shaped beacon for small-medium MediaWikis:

All the mirrors publish sha1sums of the most recent versions of all
their pages. Since MySQL has sha1 built-in, this should be do-able
(maybe future versions of MediaWiki will store the sha1sum as a field
in the 'text' table, making this even faster).

A central server pulls the sha1's of all mirrors hourly (or whatever)
and finds pages that aren't identical on all mirrors (including
newly-created pages).

The central server runs the Unix 'merge' command (several times if
needed) to create the 'new' version of the page, which may or may not
match the version on some of the servers. Irreconcilable differences
are handled by the WikiSysop (or Drew Barrymore <G>).

The central server pushes the new version to all servers that don't
already have it.

For larger MediaWikis, perhaps only publish the sha1's of pages
changed in the last 4 hours (if the central server checks hourly, this
gives plenty of overlap/redundancy).

The *only* change a mirror/mirrored MediaWiki has to make is to
install a PHP script that runs a MySQL query to report the sha1sum's
of the latest versions of all its pages. The central server handles
everything else. This works almost out-of-the-box.

Disaster recovery: if server X dies, just copy the db from server Y.

I don't really care about load-balancing or anything like that. Just
tell end users: go to mirror1.mywiki.com-- if that fails, go to
mirror2.mywiki.com, and so on. Or create a metapage that JUST lists all
the mirror URLs for your MediaWiki and tell users to try them in
order. The nice thing is that you can edit from any mirror, not just
the original. Of course, the DNS/metapage for mywiki.com has to be
more reliable than any of the mirrors.

Not sure I even care about saving money (though the pseudo-anonymity
of free hosting is nice)-- creating a semi-robust mirror-able MediaWiki
has a philosophical interest as well.

This has lots of problems (some listed below), but might be a good start?

Problems (which can be resolved long-term with some work + a more
complex process):

Page version numbers will be different on the mirrors.

Not all previous edits will be available on all mirrors (if something
gets edited several times in an hour)

For large sites, there'll be a large number of irreconcilable differences

Reverting an edit that just adds a small piece of text will be
impossible (Unix merge will always favor the version with the added
text)

The comments when pages are changed will be mostly lost

Many changes will appear to come from the central server, not the IP
address of the person who actually edited

At least 317 other problems I haven't thought about <G>

-- 
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.



More information about the MediaWiki-l mailing list