On Mon, Apr 28, 2014 at 7:52 PM, Gilles Dubuc <gilles@wikimedia.org> wrote:

I assume that the db1047 migration is over now, since I'm seeing recent data on db1047. I was looking at data this morning and noticed that there was a gap in the EventLogging data on db1047:

for table MultimediaViewerVersusPageFilePerformance_7907636

the id jumps from 190 to 203. You can see it for yourself by running:

SELECT id FROM MultimediaViewerVersusPageFilePerformance_7907636 WHERE id > 180 AND id < 210

It is InnoDB so technically auto-increment is not guaranteed to be a contiguous sequence across a restart or reload, both of which we did. But I agree that it likely should be contiguous in this case. I understand from IRC that Ori can do some scripted verification of the data to resolve any gaps.

During the migration we had a period when multiple consumers where running, one on db1047 and one on db1048. This was theorized to be harmless since the dataset UUIDs would allow duplicate rows to be merged using INSERT IGNORE. However it turned out many of the eventlogging UUID fields use normal indexes, not formal UNIQUE keys, so the inserts matched on id, which is risky.
 
Is the missing data on db1048 and not replicated to db1047? Or is it lost due to writes that happened during the migration?

db1046, a slave of db1048, died during migration due to a bug; it was to be db1047's master. db1046 is presently rebuilding after which db1047 will start replicating.

The log tables on db1047 currently are federated, not replicated. We hit an issue for very large SELECT queries without filters[1] however they should work fine for most traffic until replication starts. Certainly small queries like your example will return correct data (ie, if there are gaps they are real).

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=64445

BR
Sean

--
DBA @ WMF