Thanks. If possible, can we have:
- The exact INSERT statements issued by the MySQL consumer
- The UUID values generated for those records
I'll try to get them, sure.
I followed the master-slave replication lag for some hours, and
perceived a
pattern in the lag: It gets progressively bigger with time, more or less with a 10 minute increase per hour, reaching lags of 1 to 2 hours. At
that
point, the data gap happens and the replication lag goes back to few
minutes
lag. I could only catch a data gap "live" 2 times, so that's definitely
not
a conclusive statement. But, there's this hypothesis that the two
problems
are related.
Just for clarity, may I ask how are you testing this?
1) To identify the data gaps I used:
select left(timestamp, 11), count(*) from Edit_11448630 where timestamp >= '20150415000000' and timestamp < '20150416000000' group by 1;
Note that the table name and the timestamps can be adapted as necessary. This query returns something like:
+---------------------+----------+ | left(timestamp, 11) | count(*) | +---------------------+----------+ | 20150415000 | 9823 | | 20150415001 | 10158 | | 20150415002 | 9473 | | 20150415003 | 9493 | | 20150415004 | 9297 | | 20150415005 | 9390 | | 20150415010 | 9849 | | 20150415011 | 9619 | | 20150415012 | 10038 | | 20150415013 | 9763 | | 20150415014 | 9750 | | 20150415015 | 9633 | | ... | ... | +---------------------+----------+
Which lists the number of events existing for each 10-minute slot. When there's a data gap, the result of the query looks like this:
+---------------------+----------+ | left(timestamp, 11) | count(*) | +---------------------+----------+ | ... | ... | | 20150415150 | 21237 | | 20150415151 | 20677 | | 20150415152 | 20541 | | 20150415153 | 19671 | | 20150415154 | 19623 | | 20150415155 | 19281 | | 20150415160 | 19243 | | 20150415161 | 5708 | <= Gap: 16:20h and 16:30h have no data! | 20150415164 | 11590 | | 20150415165 | 18745 | | ... | ... | +---------------------+----------+
2) To get the master-slave replication lag I used:
select timestamp from Edit_11448630 order by 1 desc limit 1;
Again, the table name can be substituted. This gives me, supposedly, the timestamp of the last inserted event. Comparing that with the current time, I get the lag.
3) To correlate both, I just happened to be monitoring the progressively increasing replication lag, and after noticing an abrupt recovery of the latter, I checked and found a data gap.