Re: [Analytics] [Technical] Strange behavior of EL m4-master

16 Apr 2015


      On Thu, Apr 16, 2015 at 7:58 AM, Marcel Ruiz Forns mforns@wikimedia.org wrote:
...
I followed the master-slave replication lag for some hours, and perceived a
pattern in the lag: It gets progressively bigger with time, more or less
with a 10 minute increase per hour, reaching lags of 1 to 2 hours. At that
point, the data gap happens and the replication lag goes back to few minutes
lag. I could only catch a data gap "live" 2 times, so that's definitely not
a conclusive statement. But, there's this hypothesis that the two problems
are related.
Today I've run some sync tests between EL master and analytisc-slave.
So far I've not found any discrepancies -- the master and slave
tables, when replication is caught up(!), have identical data. I infer
that the data gaps you found do exist but are not related to
replication or replication lag, and are occurring somewhere upstream
of analytics-store, either on the EL master (db1046) itself or between
the master and the consumer. I'll wait to see the example UUIDs to dig
further in the master binary logs.
Regarding the replication lag; a few observations:
- Asynchronous replication will always be susceptible to lag as long
as the slave handles other traffic. The fixes done to have the
consumer batch-insert records have greatly reduced the lag problem so
that we havn't seen 24hour+ lag in months, but asynchronous
replication does just what it says on the tin :-)
- An hour or two lag observed infrequently is often due to some
*other* activity on the slave. The way to track it down is to first
look for patterns -- eg, a certain time of day may indicate a poorly
optimized cron job or suchlike. If you do catch replication lag of
greater than 5min in the act, view the DB processlist to see what
other queries are executing. Check if something is simply hammering
the box, or if something is locking records or tables that are
attempting to replicate, or ... [insert strange cause here].
BR
Sean

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] [Technical] Strange behavior of EL m4-master