Heyo, Discovery team!
(Analytics CCd)
This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).
For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!
It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.
I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!
Very excited to see this moving forward
On Mon, Aug 3, 2015 at 3:12 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Heyo, Discovery team!
(Analytics CCd)
This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).
For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!
It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.
I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!
-- Oliver Keyes Count Logula Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
what are the implications (if any) on event validation?
On Mon, Aug 3, 2015 at 3:19 PM, Tomasz Finc tfinc@wikimedia.org wrote:
Very excited to see this moving forward
On Mon, Aug 3, 2015 at 3:12 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Heyo, Discovery team!
(Analytics CCd)
This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).
For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!
It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.
I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!
-- Oliver Keyes Count Logula Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
nm, clarified with Kevin.
On Aug 3, 2015, at 18:38, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
what are the implications (if any) on event validation?
On Mon, Aug 3, 2015 at 3:19 PM, Tomasz Finc tfinc@wikimedia.org wrote: Very excited to see this moving forward
On Mon, Aug 3, 2015 at 3:12 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Heyo, Discovery team!
(Analytics CCd)
This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).
For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!
It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.
I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!
-- Oliver Keyes Count Logula Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter
wikimedia-search@lists.wikimedia.org