On 1/4/07, Sy Ali sy1234@gmail.com wrote:
On 1/2/07, Kelly Jones kelly.terry.jones@gmail.com wrote:
The mirror servers should be as independent as possible: geographically diverse, on different backbones, each using their own MySQL server, owned by different people, etc. Some of the mirror servers may be free, others may be paid hosting, others may be dedicated servers, etc.
You're looking for the holy grail of hosting.
Here's my grail-shaped beacon for small-medium MediaWikis:
All the mirrors publish sha1sums of the most recent versions of all their pages. Since MySQL has sha1 built-in, this should be do-able (maybe future versions of MediaWiki will store the sha1sum as a field in the 'text' table, making this even faster).
A central server pulls the sha1's of all mirrors hourly (or whatever) and finds pages that aren't identical on all mirrors (including newly-created pages).
The central server runs the Unix 'merge' command (several times if needed) to create the 'new' version of the page, which may or may not match the version on some of the servers. Irreconcilable differences are handled by the WikiSysop (or Drew Barrymore <G>).
The central server pushes the new version to all servers that don't already have it.
For larger MediaWikis, perhaps only publish the sha1's of pages changed in the last 4 hours (if the central server checks hourly, this gives plenty of overlap/redundancy).
The *only* change a mirror/mirrored MediaWiki has to make is to install a PHP script that runs a MySQL query to report the sha1sum's of the latest versions of all its pages. The central server handles everything else. This works almost out-of-the-box.
Disaster recovery: if server X dies, just copy the db from server Y.
I don't really care about load-balancing or anything like that. Just tell end users: go to mirror1.mywiki.com-- if that fails, go to mirror2.mywiki.com, and so on. Or create a metapage that JUST lists all the mirror URLs for your MediaWiki and tell users to try them in order. The nice thing is that you can edit from any mirror, not just the original. Of course, the DNS/metapage for mywiki.com has to be more reliable than any of the mirrors.
Not sure I even care about saving money (though the pseudo-anonymity of free hosting is nice)-- creating a semi-robust mirror-able MediaWiki has a philosophical interest as well.
This has lots of problems (some listed below), but might be a good start?
Problems (which can be resolved long-term with some work + a more complex process):
Page version numbers will be different on the mirrors.
Not all previous edits will be available on all mirrors (if something gets edited several times in an hour)
For large sites, there'll be a large number of irreconcilable differences
Reverting an edit that just adds a small piece of text will be impossible (Unix merge will always favor the version with the added text)
The comments when pages are changed will be mostly lost
Many changes will appear to come from the central server, not the IP address of the person who actually edited
At least 317 other problems I haven't thought about <G>