Hello,
One day it was announced that long running queries are being killed in case the replag exceeds some value.
I've added a simple piece of code to my tools, which prints replag info in case a query is killed and a few days ago I've got the following result:
-------------------------------------------------------------------------------------------------- ERROR 1317 (70100) at line 6874: Query execution was interrupted last replicated timestamp: 20101207214400 replag: 00:00:01 --------------------------------------------------------------------------------------------------
Could anyone explain whether it was possible that a query (even a long running one) has been killed when replag was so good?
mashiah
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Mashiah Davidson:
ERROR 1317 (70100) at line 6874: Query execution was interrupted last replicated timestamp: 20101207214400 replag: 00:00:01
Could anyone explain whether it was possible that a query (even a long running one) has been killed when replag was so good?
Did you check the replication lag on all clusters on the server your query was running on, or only one? The query killer (correctly) ignores the cluster the database cluster the query is running on, since queries on one database can still affect replication of another cluster.
(If your query was running on s1, this doesn't apply.)
- river.
Did you check the replication lag on all clusters on the server your query was running on, or only one? The query killer (correctly) ignores the cluster the database cluster the query is running on, since queries on one database can still affect replication of another cluster.
The script was working with just one DB cluster (s3/s6), so I've checked just it's replication lag.
To be honest, I am not sure how s2/s5 or s1 replag could be connected to s3/s6 replag, my view was they are working on completely different machines. Could you please explain why think the script behaviour is correct.
mashiah
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Mashiah Davidson:
The script was working with just one DB cluster (s3/s6), so I've checked just it's replication lag.
There isn't an "s3/s6" replication lag. There's an s3 replication lag and an s6 replication lag. If your query was running on an s3 database, and you checked the replication lag there, you might have missed the fact that s6, on the same server, was lagged.
- river.
There isn't an "s3/s6" replication lag. There's an s3 replication lag and an s6 replication lag. If your query was running on an s3 database, and you checked the replication lag there, you might have missed the fact that s6, on the same server, was lagged.
To be 100% precise, I check replag on the language the script works for. In this particular case it was russian. In order to check the replag my script compares current time to the last edit time in the language database.
Am I right that I need to check s3 replag as well for this particular case but do not have to check replag on s2, s5 and s1?
mashiah
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Mashiah Davidson:
Am I right that I need to check s3 replag as well for this particular case but do not have to check replag on s2, s5 and s1?
Yes. The server sets are: s1; s3/s6; s2/s5. In addition commons (s4) is present on all servers.
- river.
toolserver-l@lists.wikimedia.org