[Labs-l] Lag reporting on lab db replicas

Ricordisamoa ricordisamoa at openmailbox.org
Wed Nov 25 20:51:23 UTC 2015


Il 25/11/2015 21:21, Jaime Crespo ha scritto:
> Always fearing doing queries on a lagged replica on labs? Not anymore!
>
> While Betacommand's tool [0] was very useful, it was also very 
> inaccurate, as it tried to check the lag by looking at the last rows 
> updated, which can be a lot of time on the least popular wikis.
>
> What I offer now is sub-second accurate lag measuring, by writing on 
> the production masters the current time, in microseconds, every 0.5 
> seconds and making that available on all hosts (using this tool [1]). 
> So, it is more accurate than SHOW SLAVE STATUS, because it compares 
> the difference with the original master, and it will work even if 
> replication is broken.

So even if the replicas don't get updated the heartbeat will report them 
as up to date?

>
> To read it, just do SELECT * FROM heartbeat_p.heartbeat;
> And you will get:
> +-------+----------------------------+------+
> | shard | last_updated               | lag  |
> +-------+----------------------------+------+
> | s6    | 2015-11-25T20:20:32.000980 |    0 |
> | s2    | 2015-11-25T20:20:32.001030 |    0 |
> | s7    | 2015-11-25T20:20:32.001070 |    0 |
> | s3    | 2015-11-25T20:20:32.001000 |    0 |
> | s4    | 2015-11-25T20:20:32.000920 |    0 |
> | s1    | 2015-11-25T20:20:32.000740 |    0 |
> | s5    | 2015-11-25T20:20:32.000830 |    0 |
> +-------+----------------------------+------+
>
> Read the detailed documentation on: [2]
>
> Use it, create a web page if you want to make it public! Report a 
> ticket if it gets too high! Report a ticket if you need more info (a 
> record per wiki?). But I wanted to give you the essentials, and you 
> can build yourselves on top of that.
>
> Only 2 know bugs:
> - There is microsecond accuracy, but it cannot be used until a bug in 
> MariaDB is fixed [3]
> - enwiki will only report s1 lag until that server is restarted due to 
> some existing filters. We will schedule that at some time in the future.
>
> [0]<http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag>
> [1]<https://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html>
> [2]<https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Identifying_lag>
> [3]<https://mariadb.atlassian.net/browse/MDEV-9175>
> -- 
> Jaime Crespo
> <http://wikimedia.org>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20151125/b3130147/attachment.html>


More information about the Labs-l mailing list