Hi,
Over the past quarter, we've been working on adding instrumentation to the EventBus https://www.mediawiki.org/wiki/Extension:EventBus extension using prometheus.
A dashboard is available in Grafana under EventBus https://grafana.wikimedia.org/goto/UwFG59gHR?orgId=1..
We’ve added metrics to enhance our understanding of event intake across streams, as well as the response statuses from EventGate https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate instances. This will help us quantify event production failures, which in turn will assist in defining, and supporting, SLOs for Event Platform. http://Event_Platform
A rationale and more information about this effort can be found in task T363587 https://phabricator.wikimedia.org/T363587. For questions, suggestions, or bug reports, please create a Phabricator task and tag Event-Platform.
Gabriele, for the Data Engineering team.
Cheers, -- Gabriele Modena (he / him) Staff Software Engineer Wikimedia Foundation
wikitech-l@lists.wikimedia.org