On Mon, Apr 28, 2014 at 7:52 PM, Gilles Dubuc <gilles(a)wikimedia.org> wrote:
I assume that the db1047 migration is over now, since I'm seeing recent
data on db1047. I was looking at data this morning and noticed that there
was a gap in the EventLogging data on db1047:
for table MultimediaViewerVersusPageFilePerformance_7907636
the id jumps from 190 to 203. You can see it for yourself by running:
SELECT id FROM MultimediaViewerVersusPageFilePerformance_7907636 WHERE id
180 AND id < 210
It is InnoDB so technically auto-increment is not guaranteed to be a
contiguous sequence across a restart or reload, both of which we did. But I
agree that it likely should be contiguous in this case. I understand from
IRC that Ori can do some scripted verification of the data to resolve any
gaps.
During the migration we had a period when multiple consumers where running,
one on db1047 and one on db1048. This was theorized to be harmless since
the dataset UUIDs would allow duplicate rows to be merged using INSERT
IGNORE. However it turned out many of the eventlogging UUID fields use
normal indexes, not formal UNIQUE keys, so the inserts matched on id, which
is risky.
Is the missing data on db1048 and not replicated to
db1047? Or is it lost
due to writes that happened during the migration?
db1046, a slave of db1048, died during migration due to a bug; it was to be
db1047's master. db1046 is presently rebuilding after which db1047 will
start replicating.
The log tables on db1047 currently are federated, not replicated. We hit an
issue for very large SELECT queries without filters[1] however they should
work fine for most traffic until replication starts. Certainly small
queries like your example will return correct data (ie, if there are gaps
they are real).
[1]
https://bugzilla.wikimedia.org/show_bug.cgi?id=64445
BR
Sean
--
DBA @ WMF