It's "easy" to mirror a MediaWiki from one primary server to a number of secondary servers, but is it possible to have multiple primary servers?
Example: 10 servers and users can make changes on ANY of the 10 servers. Every night, the servers rsync to each other as follows:
1. If server X's version hasn't changed all day and server Y's version HAS changed, server X accepts server Y's version.
2. If both server X's and server Y's versions have changed, automatic CVS style merging is used to resolve the changes.
3. If CVS style merging yields a conflict, the site maintainer is notified and must merge the two files manually (I'm thinking of a creating a small site, so this shouldn't be too painful)
I realize the rules above only work for 2 servers -- is there a clever version of this for n servers (n>2)?
On 12/31/06, Kelly Jones kelly.terry.jones@gmail.com wrote:
It's "easy" to mirror a MediaWiki from one primary server to a number of secondary servers, but is it possible to have multiple primary servers?
Example: 10 servers and users can make changes on ANY of the 10 servers. Every night, the servers rsync to each other as follows:
- If server X's version hasn't changed all day and server Y's version
HAS changed, server X accepts server Y's version.
- If both server X's and server Y's versions have changed, automatic
CVS style merging is used to resolve the changes.
- If CVS style merging yields a conflict, the site maintainer is
notified and must merge the two files manually (I'm thinking of a creating a small site, so this shouldn't be too painful)
I realize the rules above only work for 2 servers -- is there a clever version of this for n servers (n>2)?
What are you trying to accomplish by doing that?
The way the data is in the databases, it's a little more hard than that.
How well do you understand clustering theory?
Hi,
May be you can try to have a look to mysql-cluster feature ...
But, i guess that you must adapt/change the mediawiki source code il order to make sql requests mysql-cluster compliant ... :)
should be hard job ... but funny
Best regards
Arnaud.
Le 2 janv. 07 à 01:31, George Herbert a écrit :
On 12/31/06, Kelly Jones kelly.terry.jones@gmail.com wrote:
It's "easy" to mirror a MediaWiki from one primary server to a number of secondary servers, but is it possible to have multiple primary servers?
Example: 10 servers and users can make changes on ANY of the 10 servers. Every night, the servers rsync to each other as follows:
- If server X's version hasn't changed all day and server Y's
version HAS changed, server X accepts server Y's version.
- If both server X's and server Y's versions have changed, automatic
CVS style merging is used to resolve the changes.
- If CVS style merging yields a conflict, the site maintainer is
notified and must merge the two files manually (I'm thinking of a creating a small site, so this shouldn't be too painful)
I realize the rules above only work for 2 servers -- is there a clever version of this for n servers (n>2)?
What are you trying to accomplish by doing that?
The way the data is in the databases, it's a little more hard than that.
How well do you understand clustering theory?
-- -george william herbert george.herbert@gmail.com _______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Hello Kelly Jones,
It seems to me that what you are trying to accomplish is already part of MediaWiki -- just have all of the ten MediaWikis use the same server, or set up each section of articles on a different interwiki-linked database. This would remove the need to rsync between each of the servers, and it would remove the need for you to manually change anything. The conflicts would not happen, because everything would be done in real time.
What are you trying to do?
Kasimir
On 1/1/07, KlinT klint@klintcentral.net wrote:
Hi,
May be you can try to have a look to mysql-cluster feature ...
But, i guess that you must adapt/change the mediawiki source code il order to make sql requests mysql-cluster compliant ... :)
should be hard job ... but funny
Best regards
Arnaud.
Le 2 janv. 07 à 01:31, George Herbert a écrit :
On 12/31/06, Kelly Jones kelly.terry.jones@gmail.com wrote:
It's "easy" to mirror a MediaWiki from one primary server to a number of secondary servers, but is it possible to have multiple primary servers?
Example: 10 servers and users can make changes on ANY of the 10 servers. Every night, the servers rsync to each other as follows:
- If server X's version hasn't changed all day and server Y's
version HAS changed, server X accepts server Y's version.
- If both server X's and server Y's versions have changed, automatic
CVS style merging is used to resolve the changes.
- If CVS style merging yields a conflict, the site maintainer is
notified and must merge the two files manually (I'm thinking of a creating a small site, so this shouldn't be too painful)
I realize the rules above only work for 2 servers -- is there a clever version of this for n servers (n>2)?
What are you trying to accomplish by doing that?
The way the data is in the databases, it's a little more hard than that.
How well do you understand clustering theory?
-- -george william herbert george.herbert@gmail.com _______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
I'm trying to run mediawiki on a free server, but I'm worried about bandwidth limits.
I'd like to create mirror sites or let other people create mirror sites (the content will be 100% open source) and have a "master site" that "randomly" redirects people to a mirror.
The mirror sites should not be "read-only". People should be able to edit content on the mirror sites and have their changes pushed to the other mirror sites.
The mirror servers should be as independent as possible: geographically diverse, on different backbones, each using their own MySQL server, owned by different people, etc. Some of the mirror servers may be free, others may be paid hosting, others may be dedicated servers, etc.
The only thing they'd have in common is they'd all be running mediawiki and some cron script that merges changed data in from all the mirrors.
Thoughts?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Kelly Jones wrote:
I'm trying to run mediawiki on a free server, but I'm worried about bandwidth limits.
Ah, you want something for nothing. :)
- -- brion vibber (brion @ pobox.com)
Kelly Jones wrote:
I'm trying to run mediawiki on a free server, but I'm worried about bandwidth limits.
Hosting is cheap. Unless you are working on a huge project (and then you'll need money, money maker, and/or sponsors, but that's the way it should be) it will cost you only a few bucks a month, top.
On 1/2/07, Kelly Jones kelly.terry.jones@gmail.com wrote:
The mirror servers should be as independent as possible: geographically diverse, on different backbones, each using their own MySQL server, owned by different people, etc. Some of the mirror servers may be free, others may be paid hosting, others may be dedicated servers, etc.
You're looking for the holy grail of hosting.
Some of this has been implemented via ideas like mysql-cluster (mentioned earlier) but I gather that mediawiki itself would need to be modified to support that technology.
Load-balancing a webserver is already something that's fairly well known.
Basically you'd have to have some front-end computer which understands the bandwidth usage of all of its attached hosts, and it would intelligently balance the load over to those mirrors which have bandwidth left.
And then those secondary computers would each have an installation which would have some kind of synthronized database cache. This would require a bit of MySQL magic, but it is supported by it. But it's not supported by MediaWiki.. so edit collisions and such would be impossible to sort out if you have cached-reads from or slow-writes with the mirror sites..
But here's the problem.. what bandwidth is being saved, when the same copy of a page has to get synced to all of the other mirrors? Oh, you could modify mediawiki to only update its local copy if a page edit request comes in and an old copy is stored locally.. but where is the disaster recovery when one of your mirrors goes down?
This is a complex issue.. and probably not worth the spare change which basic hosting would cost. If the site is wildly popular.. then become self-supporting via memberships, ads, donations, etc..
On 1/4/07, Sy Ali sy1234@gmail.com wrote:
On 1/2/07, Kelly Jones kelly.terry.jones@gmail.com wrote:
The mirror servers should be as independent as possible: geographically diverse, on different backbones, each using their own MySQL server, owned by different people, etc. Some of the mirror servers may be free, others may be paid hosting, others may be dedicated servers, etc.
You're looking for the holy grail of hosting.
Here's my grail-shaped beacon for small-medium MediaWikis:
All the mirrors publish sha1sums of the most recent versions of all their pages. Since MySQL has sha1 built-in, this should be do-able (maybe future versions of MediaWiki will store the sha1sum as a field in the 'text' table, making this even faster).
A central server pulls the sha1's of all mirrors hourly (or whatever) and finds pages that aren't identical on all mirrors (including newly-created pages).
The central server runs the Unix 'merge' command (several times if needed) to create the 'new' version of the page, which may or may not match the version on some of the servers. Irreconcilable differences are handled by the WikiSysop (or Drew Barrymore <G>).
The central server pushes the new version to all servers that don't already have it.
For larger MediaWikis, perhaps only publish the sha1's of pages changed in the last 4 hours (if the central server checks hourly, this gives plenty of overlap/redundancy).
The *only* change a mirror/mirrored MediaWiki has to make is to install a PHP script that runs a MySQL query to report the sha1sum's of the latest versions of all its pages. The central server handles everything else. This works almost out-of-the-box.
Disaster recovery: if server X dies, just copy the db from server Y.
I don't really care about load-balancing or anything like that. Just tell end users: go to mirror1.mywiki.com-- if that fails, go to mirror2.mywiki.com, and so on. Or create a metapage that JUST lists all the mirror URLs for your MediaWiki and tell users to try them in order. The nice thing is that you can edit from any mirror, not just the original. Of course, the DNS/metapage for mywiki.com has to be more reliable than any of the mirrors.
Not sure I even care about saving money (though the pseudo-anonymity of free hosting is nice)-- creating a semi-robust mirror-able MediaWiki has a philosophical interest as well.
This has lots of problems (some listed below), but might be a good start?
Problems (which can be resolved long-term with some work + a more complex process):
Page version numbers will be different on the mirrors.
Not all previous edits will be available on all mirrors (if something gets edited several times in an hour)
For large sites, there'll be a large number of irreconcilable differences
Reverting an edit that just adds a small piece of text will be impossible (Unix merge will always favor the version with the added text)
The comments when pages are changed will be mostly lost
Many changes will appear to come from the central server, not the IP address of the person who actually edited
At least 317 other problems I haven't thought about <G>
Hi!
Some of this has been implemented via ideas like mysql-cluster
(mentioned earlier) but I gather that mediawiki itself would need to be modified to support that technology.
MySQL Cluster is designed to run in low-latency environment. Though local gigabit ethernet provides quite good round trip time, specialized server interconnects (like dolphin one) help even more. Geo-balancing on cluster is pain (unless you have really good interconnect, sub-10ms). And you may want to run one of late 5.1 snapshots.. ;-)
Load-balancing a webserver is already something that's fairly well known.
We do lots of database load balancing, with replication. It still has single master for write requests, though.
intelligently balance the load over to those mirrors which have bandwidth left.
Just run a caching proxy.. ;) It is probably possible to implement that as PHP script too, though it would suck.
This is a complex issue.. and probably not worth the spare change
which basic hosting would cost. If the site is wildly popular.. then become self-supporting via memberships, ads, donations, etc..
Dedicated servers nowadays are at 50eur/month range. With unlimited traffic \o/
-- Domas http://dammit.lt/
mediawiki-l@lists.wikimedia.org