Sean and list,

I think we found the problem:

The data loss is happening within EL consumer code.
The error was skillfully dodging the logs, sorry for that.

The root cause is that the db insertion takes too long to keep up
with the rate of incoming events, and the events buffer gets big.
When big enough, the program crashes and the buffered data is lost.

Thanks Sean for your comments, they helped a lot!
Will update the phab task with a detailed explanation and next steps.
https://phabricator.wikimedia.org/T96082

Cheers,

Marcel