[Labs-l] Commons replag

Sun Apr 6 18:11:36 UTC 2014

I wrote:

>> Commons replag is at about 19.5 hours lag.

> s1 is also lagging about an hour now; filed
> https://bugzilla.wikimedia.org/63575.

Replication has caught up now; to quote Sean from the bug
report:

| This has recovered.  Some users were running very slow
| DELETE queries of the form DELETE ... WHERE ... (SELECT
| ... ) with no limits or batches.  This sort of bad practice
| has been brought up on Labs-l previously, but it seems it
| will need policing.

He is referring to
http://permalink.gmane.org/gmane.org.wikimedia.labs/2225:

| Sometimes on labsdb we see slow queries. Some are marked with /* SLOW_OK */
| comment, some are not.

| Slow read queries are fine. InnoDB purge lag will climb but no big deal.

| Slow write queries can make replication lag or block. Problem transactions
| holding locks for days may get killed, regardless of SLOW_OK comments, to
| maintain QoS.

| If writing data on labsdbs try to do it in batches. It will make replag
| rare, and in the event something does get killed, easier to resume.

So if you have writing queries on the replica servers,
please make sure that they can be processed quickly or re-
think your overall data flow.

Tim