I think we should split up Eventlogging and the other m2 clients (OTRS and some minor players). Several reasons:
- Backfilling causes replication lag. Using faster out-of-band replication for EL is easy because it is all simple bulk-INSERT statements, but the same does not apply for the other clients. They need different approaches.
- Master disk space. Even with the data purging discussed at the MW Summit, I would feel better if EL had more headroom that is does currently, and zero possibility of unexpected spikes in disk activity and usage affecting other services.
- EL is the service most sensitive to connection dropouts. Recently Ori and Nuria have been tweaking SqlAlchemy, but future connection problems like those seen last week would be easier to debug without having to risk affecting other services.
I am therefore arranging to promote the current m2 slave db1046 to master of an m4 cluster tuned for EL, including backfilling. Analytics-store, s1-analytics-slave, and the new CODFW server will simply switch to replicate from the new master.
For switchover of writes, we'll need to coordinate an EL consumer restart to use a new CNAME of m4-master.eqiad.wmnet and allow vanadium the relevant network access, and then presumably do a little backfilling. When would be a reasonable time within the next fortnight or so?
Sean