Ive seen the primary sql servers for 1,2/5 lagged about 12 hours for the last day. While the fast servers are current. Any idea on the source of the issue?
John
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
John:
Ive seen the primary sql servers for 1,2/5 lagged about 12 hours for the last day.
One of the s1 servers (thyme) was lagging; probably because a user allocated a very large MEMORY table and caused the server to swap. I restarted and server (and fixed the limit on MEMORY table size), and it seems to be doing better now.
I don't see any particularly serious lag for s2: http://munin.toolserver.org/Database/daphne/mysql_replication.html
Unfortunately we don't have replication lag graphs for s5 at the moment. If there was lag there, it's probably because the s2/s5 server (daphne) is somewhat overloaded, since there's only one of them, while other clusters have two servers.
While the fast servers are current.
There are no fast servers anymore; sql-sX-rr and sql-sX-fast both point to both servers for each cluster (except s2/s5).
- river.
toolserver-l@lists.wikimedia.org