Posted to the Wikipedia misc. list.
TBSDY
I've just heard that ariel, our master DB server, recently ran out of disk space. Writes to the binlogs stopped but it kept committing to its local disk, thereby putting all the slaves irreversibly out of sync. This is a nasty failure mode that we previously weren't aware was possible.
To resync the slaves, we will have to put the site into read-only mode for maybe 6-12 hours. I'm hoping this will be done as soon as possible, because in the meantime, we will have poor performance. The slaves used to share the read load. Also, if ariel dies, we'll be down to periodic backups, meaning data loss and days of downtime.
Search will be disabled until the slaves are resynced, and other emergency optimisation measures such as watchlist caching might also need to be used.
-- Tim Starling
Yow! I don't know how so many posts happened... sorry guys!
TBSDY
csherlock@ljh.com.au wrote:
Posted to the Wikipedia misc. list.
TBSDY
I've just heard that ariel, our master DB server, recently ran out of disk space. Writes to the binlogs stopped but it kept committing to its local disk, thereby putting all the slaves irreversibly out of sync. This is a nasty failure mode that we previously weren't aware was possible.
To resync the slaves, we will have to put the site into read-only mode for maybe 6-12 hours. I'm hoping this will be done as soon as possible, because in the meantime, we will have poor performance. The slaves used to share the read load. Also, if ariel dies, we'll be down to periodic backups, meaning data loss and days of downtime.
Search will be disabled until the slaves are resynced, and other emergency optimisation measures such as watchlist caching might also need to be used.
-- Tim Starling