[Labs-l] Lag reporting on lab db replicas

Bryan Davis bd808 at wikimedia.org
Sat Nov 28 19:58:06 UTC 2015


On Sat, Nov 28, 2015 at 12:28 PM, Ricordisamoa
<ricordisamoa at openmailbox.org> wrote:
> Il 26/11/2015 09:19, Jaime Crespo ha scritto:
>>
>> > So even if the replicas don't get updated the heartbeat will report them
>> > as up to date?
>>
>> Not sure exactly what you mean with that. The masters will be updated
>> continuously every 0.5 seconds (all slaves are read only- no writes are done
>> there). If replication works, and slaves get updated, that will mean that
>> they will receive the heartbeat with the same replication channel than the
>> rest of the updates. If replication doesn't work, and replicas do not get
>> updated, they will not receive the heartbeat either, as it comes from
>> replication in order. If replication stops/fails, heartbeat update will stop
>> (from the slave perspective), and lag will start to increase from your
>> perspective (difference between last timestamp written and current time).
>>
>> This measures the replication lag (aka difference with the master), not
>> the last time an edit was done by a user, which was what the first link I
>> sent measured. In other words, if jaimewiki receives only user edits every
>> hour, heartbeat will still do a write to its master every half a seconds,
>> thus proving that it is up to date with that resolution. You can still check
>> the last user edit by checking recentchanges.
>>
>> The only reason this could fail (heartbeat updated but wiki not) is if
>> there was a specific filter denying replication but allowing hearbeat, only
>> done for specific tables and private wikis. Also the production master could
>> have a problem, but that would affect the wikis itselves, not only labs.
>>
>> To give you an idea of the accuracy of this method, we (will) use it on
>> production to decide if a slave is usable or not to return up-to-date data.
>>
>> For more information on how this works, check
>> <https://www.percona.com/doc/percona-toolkit/2.1/pt-heartbeat.html#description>
>>
>
> I don't understand, please explain to a 5 years old :-)

I'll try:

* Each master server has a "heartbeat" table where it updates a row
every 0.5 seconds with a timestamp value.
* Each replica server has a copy of this heartbeat table that only
receives updated timestamps via replication.
* Each replica server also has a view that shows the difference
between the current system time and the heartbeat table's timestamp.
* This difference (current system time - last timestamp seen from
master) is the true replication delta between the master and the
replica.
* The only way that a replica server could see an updated tiemstamp
from the master without also having all other changes locally would be
via explicit configuration that allowed heartbeat updates but excluded
others.

Bryan
-- 
Bryan Davis              Wikimedia Foundation    <bd808 at wikimedia.org>
[[m:User:BDavis_(WMF)]]  Sr Software Engineer            Boise, ID USA
irc: bd808                                        v:415.839.6885 x6855



More information about the Labs-l mailing list