Jani Patokallio wrote:
Thanks for the fast and authoritative reply!
Brion Vibber <brion at pobox.com> wrote:
It sounds like you mainly want to use replication to maintain a hot standby database, not for load balancing. It may actually be best here to not tell MediaWiki about the slave at all: use the master only, and consider using some other HA proxy or whatever to hot-swap the old slave in for the master if the master stops responding.
So the slave becomes the master and starts accepting writes? Doesn't this imply that the direction of replication on the MySQL level has to be reversed after the former master comes back online, and now has to become the slave to get any changes that were made in the meantime? This sounds fairly painful, especially if the failure is intermittent. Alternatively, if you're saying that the slave should be read-only even when it takes over, then this isn't really much of an improvement on the current state of affairs.
Yes. If you make promote a slave to a master, the master would then need to be converted into a slave. Even worse, it is possible that some old commits were into the old master but not replicated into the slave (that's typical when the binlog disk gets full). So the data of the old master is wrong and you need now to reimport it.
Very early on we did try to fall back to master if no non-lagged slaves were available, however it can be highly problematic if you're using replication for the purpose of load balancing -- which is what MediaWiki's explicit support for multiple database servers is designed for.
Problem is, while our system does occasionally get spikes of load usually involving heavy reads of swathes of the database (which is why we've got the master/slave split), the replication typically fails randomly for some reason other than load: network glitches, out of disk space, etc. So it's not that lag is high, it's that replication has failed entirely.
Anyway, I gather that the best thing to do is still set max lag, since at least this way the non-replicating slave switches MediaWiki into read-only mode and the user gets a clear failure message instead of just wondering why their edits seem to disappear into the ether.
Yes, seems that the solution Brion suggested is not the appropiate one for your setup.