We are also putting a similar plan in place. For the database we have a
MySQL cluster running v5.1. I'm not the DB guy but my understanding is
that with two machines in a cluster, both machines are updated when a
database transaction is made and if one machine goes down the other will
continue to operate. So this redundant DB cluster (2 machines but you
can have more) handles single DB server disasters on the DB end.
Each wiki's database (we have many wikis on our server, 27 and
counting...)is backed up twice daily, once on our primary wiki server
and the second on our secondary wiki server under /var/www/DBbackups.
Backups are done using mysqldump and gzipped. This puts backups on both
our wiki servers.
The entire /var/www directory on each of two wiki servers is backed up
for offsite storage daily and so our DB backups stored in
/var/www/DBbackups and wiki directories stored under /var/www/html are
all backed up to offsite storage as is an image backup of each system
which we would use for a complete system restore only.
Our two wiki servers have a similar scenario to your's where we would
switch the DNS c-name of the primary server to point at the secondary
server if the primary server crashes. A rsync keeps the /var/www/html
directory of the second server in sync with the first. This also acts
as a live backup of the Mediawiki software of each wiki. Rsync runs
twice a day but we are considering every hour since it seems to only
take a minute or two to run. The secondary wiki server also acts as a
test server with updates and patches applied there first and tested
before applying them to the primary.
In a disaster where we loose a DB server:
The second DB server continues to run and we repair or replace the
broken DB server. We then add it back to the cluster where it is
updated by the second DB server. If we loose the whole cluster, DBs can
be restored from the latest backups stored on the wikii server, or if
they are also gone, the offsite backup.
In a disaster where we loose the primary wiki server:
We switch the DNS name to the second server which is now the new primary
server. Repair the broken primary server which then becomes our new
secondary server. The only software change we need to make on each is
the direction of the rsync, which is from primary -> secondary.
In a disaster where we loose both wiki servers:
The wiki directories are restored from offsite backup onto repaired or
new servers or we restore the entire system from an image backup, which
is likely what we would do in this scenario if the damage requires a
reinstall of the OS.
In the case where offsite backups are required we expect some downtime
but in the case of a single DB server or single wiki server disaster,
downtime should be minimal.
Whew! I think I just wrote my requisite Disaster Recovery Plan due
Monday!
-Jim
-----Original Message-----
From: zham rock [mailto:zhamrock@gmail.com]
Sent: Friday, November 02, 2007 11:25 AM
To: MediaWiki announcements and site admin list
Subject: [Mediawiki-l] Disaster Recovery
HI Guys - Please advice for the best approach on how to plan for
Disaster Recovery? We have another wiki server on separate datacenter.
Here is our requirements:
- DR Database should have same content on the Production so we could
just swing the DNS alias to the DR server.
- How to replicate the database, content and images?
- How frequent is the replication?
- How to back-up the whole wiki?
Any advice will be highly appreciated.
Regards'
zham
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l