Well, we consume our Kafka streams into HDFS and check the sequence numbers with Hive through Oozie, the jobs and scripts are here:
So it's a bit more complicated and not directly useful to your data flow (Kafkatee -> Mysql, right?). But we'd love to help you get familiar with the code and approach. This script computes the stats and puts them in wmf.webrequest_sequence_stats:
This is then aggregated hourly, and checked by this workflow, which sends emails if it sees problems:
We can then use information about data quality for each hour to re-run jobs, postpone jobs that would compute bad data, and so on. And we do some of that, but we've changed it a bit over the years so if you'd like more detail you can grab someone like Joseph and have a quick meeting.