New subject: [Wikimedia-search] Scaleable Event Systems recap

4 Aug 2015

Heyo, Discovery team!

(Analytics CCd)

This is just a quick writeup of the Scaleable Event Systems meeting
that Erik, Dan, Stas and I went to (although just from my
perspective).

For people not in the initial thread, this is a proposal to replace
the internal architecture of EventLogging and similar services with
Apache Kafka brokers
(http://www.confluent.io/blog/stream-data-platform-1/ ). What that
means in practice is that the current 1-2k events/second limit on
EventLogging will disappear and we can stop worrying about sampling
and accidentally bringing down the system. We can be a lot less
cautious about our schemas and a lot less cautious about our sampling
rate!

It also offers up a lot of opportunities around streaming data and
making it available in a layered fashion - while we don't want to
explore that right now, I don't think, it's nice to have as an option
when we better understand our search data and how we can safely
distribute it.

I'd like to thank the Analytics team, particularly Andrew, for putting
this together; it was a super-helpful discussion to be in and this
sort of product is precisely what I, at least, have been hoping for
out of the AnEng brain trust. Full speed ahead!

-- 
Oliver Keyes
Count Logula
Wikimedia Foundation