-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
After the s3/s7 split at Wikimedia, which moved several databases from s3 to the new s7 cluster, we retained both databases on the same (s3) server, which is our usual policy in such cases.
Unfortunately, over the last couple of days Wikimedia executed several DROP DATABASE statements on their s7 server, for the old s3 databases. These statements were replicated to our s3/s7 server and dropped the live databases on our server. As a result:
* Several s3 databases (all of which start with 'a') are no longer available * s3 replication is halted due to the missing databases * s7 replication is halted to prevent further destruction of data.
The only way to resolve this issue is to re-import the data from Wikimedia's databases, which will take a few days at least.
- river.
On Tue, Jan 18, 2011 at 9:08 PM, River Tarnell river.tarnell@wikimedia.de wrote:
Unfortunately, over the last couple of days Wikimedia executed several DROP DATABASE statements on their s7 server, for the old s3 databases. These statements were replicated to our s3/s7 server and dropped the live databases on our server. As a result:
Would it be possible to revoke the drop command from the replication user to prevent this from happening in the future?
Bryan
Op 18-1-2011 21:26, Bryan Tong Minh schreef:
On Tue, Jan 18, 2011 at 9:08 PM, River Tarnell river.tarnell@wikimedia.de wrote:
Unfortunately, over the last couple of days Wikimedia executed several DROP DATABASE statements on their s7 server, for the old s3 databases. These statements were replicated to our s3/s7 server and dropped the live databases on our server. As a result:
Would it be possible to revoke the drop command from the replication user to prevent this from happening in the future?
And yet again WMF forgets about the Toolserver and breaks replication. This makes you wonder if the WMF takes the Toolserver serious.
Maarten
Bryan
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Bryan Tong Minh:
Would it be possible to revoke the drop command from the replication user to prevent this from happening in the future?
Yes. I also considered modifying trainwreck (our replication tool) to ignore DROP DATABASE commands; perhaps we could do both.
It would be easier if these commands wouldn't find their way into the binlog in the first place, but mistakes happen. Unfortunately there's no way to prevent every possible command that might break our database.
In the future I hope to have one MySQL instance per cluster, which would more closely mirror Wikimedia's configuration and hopefully make errors like this less common.
- river.
River Tarnell wrote:
Bryan Tong Minh:
Would it be possible to revoke the drop command from the replication user to prevent this from happening in the future?
Yes. I also considered modifying trainwreck (our replication tool) to ignore DROP DATABASE commands; perhaps we could do both.
It would be easier if these commands wouldn't find their way into the binlog in the first place, but mistakes happen. Unfortunately there's no way to prevent every possible command that might break our database.
In the future I hope to have one MySQL instance per cluster, which would more closely mirror Wikimedia's configuration and hopefully make errors like this less common.
- river.
There are legitimate cases for dropping tables. I think that on getting a DROP command trainwreck should send an email to the admins and halt replication.
"Platonides" platonides@gmail.com wrote in message news:4D362363.8000604@gmail.com...
River Tarnell wrote: There are legitimate cases for dropping tables. I think that on getting a DROP command trainwreck should send an email to the admins and halt replication.
Presumably receiving any sort of serious error must stop replication anyway, or it risks digging itself into an even deeper hole (what if the next command is a CREATE DATABASE with the same name?). Setting it up to call for help when it errors out, and to make sure that it *does* error out before doing anything too stupid, are two easily-separable steps, both of which can probably use existing admin infrastructure.
--HM
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
River Tarnell:
- Several s3 databases (all of which start with 'a') are no longer available
- s3 replication is halted due to the missing databases
- s7 replication is halted to prevent further destruction of data.
The only way to resolve this issue is to re-import the data from Wikimedia's databases, which will take a few days at least.
This should be fixed now. s3 is fine, while s7 is replicating but ~2.5 days behind. It should catch up within the next day or so. Currently only hyacinth is serving s3, s4, s6 and s7; once s7 has caught up, we will copy its data to cassia and re-add the secondary server.
- river.
toolserver-l@lists.wikimedia.org