Scaleable Event Systems recap

List overview All Threads
Download

newer

older

Webrequest loss due to Kafka...

Fwd: [PRESS] Wikipedia suddenly...

Oliver Keyes

4 Aug 2015 4 Aug '15

12:12 a.m.

Heyo, Discovery team!

(Analytics CCd)

This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).

For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!

It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.

I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!

-- Oliver Keyes Count Logula Wikimedia Foundation

Show replies by date

Tomasz Finc

4 Aug 4 Aug

12:19 a.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

Very excited to see this moving forward

On Mon, Aug 3, 2015 at 3:12 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...

Heyo, Discovery team!

(Analytics CCd)

This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).

For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!

It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.

I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!

-- Oliver Keyes Count Logula Wikimedia Foundation

Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Dario Taraborelli

3:38 a.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

what are the implications (if any) on event validation?

On Mon, Aug 3, 2015 at 3:19 PM, Tomasz Finc tfinc@wikimedia.org wrote:

...

Very excited to see this moving forward

On Mon, Aug 3, 2015 at 3:12 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...
Heyo, Discovery team!

(Analytics CCd)

This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).

For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!

It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.

I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!

-- Oliver Keyes Count Logula Wikimedia Foundation

Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter http://twitter.com/readermeter

Dario Taraborelli

4:29 a.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

nm, clarified with Kevin.

...

On Aug 3, 2015, at 18:38, Dario Taraborelli dtaraborelli@wikimedia.org wrote:

what are the implications (if any) on event validation?

...
On Mon, Aug 3, 2015 at 3:19 PM, Tomasz Finc tfinc@wikimedia.org wrote: Very excited to see this moving forward

On Mon, Aug 3, 2015 at 3:12 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...
Heyo, Discovery team!

(Analytics CCd)

This is just a quick writeup of the Scaleable Event Systems meeting that Erik, Dan, Stas and I went to (although just from my perspective).

For people not in the initial thread, this is a proposal to replace the internal architecture of EventLogging and similar services with Apache Kafka brokers (http://www.confluent.io/blog/stream-data-platform-1/ ). What that means in practice is that the current 1-2k events/second limit on EventLogging will disappear and we can stop worrying about sampling and accidentally bringing down the system. We can be a lot less cautious about our schemas and a lot less cautious about our sampling rate!

It also offers up a lot of opportunities around streaming data and making it available in a layered fashion - while we don't want to explore that right now, I don't think, it's nice to have as an option when we better understand our search data and how we can safely distribute it.

I'd like to thank the Analytics team, particularly Andrew, for putting this together; it was a super-helpful discussion to be in and this sort of product is precisely what I, at least, have been hoping for out of the AnEng brain trust. Full speed ahead!

-- Oliver Keyes Count Logula Wikimedia Foundation

Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

--

Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter

Federico Leva (Nemo)

10:24 a.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

Oliver Keyes, 04/08/2015 00:12:

...

a lot less cautious about our sampling rate!

A bit, perhaps, not a lot. Sampling is not just a performance matter.

Nemo

Oliver Keyes

10:27 a.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

On 4 August 2015 at 04:24, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Oliver Keyes, 04/08/2015 00:12:

...
a lot less cautious about our sampling rate!

A bit, perhaps, not a lot. Sampling is not just a performance matter.

Could you expand on that?

...

Nemo

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Count Logula Wikimedia Foundation

Dan Andreescu

13 Aug 13 Aug

4:36 p.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

On Tue, Aug 4, 2015 at 4:27 AM, Oliver Keyes okeyes@wikimedia.org wrote:

...

On 4 August 2015 at 04:24, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Oliver Keyes, 04/08/2015 00:12:

...
a lot less cautious about our sampling rate!

A bit, perhaps, not a lot. Sampling is not just a performance matter.

Could you expand on that?

Not to speak for Nemo, but we don't want reckless abandon just because the system won't break. Thrift is one of our values at the foundation, and if we don't need to scale out with more hardware, we shouldn't. So I think if data collected has value beyond the cost to wrangle it through Kafka / HDFS / dashboards, then it should be collected. If not, it should be sampled until it does. There may not be an easy way to measure this so we'll have to rely on good old subjective consensus. I promise we won't be too strict about it, we'll just kindly ask people to think twice before collecting a lot of data.

Oliver Keyes

4:49 p.m.

New subject: [Wikimedia-search] Scaleable Event Systems recap

Indeed; I'm familiar with the WMF's values ;). I was trying to work out if it was a hardware cost thing, a privacy thing, a...etc, etc.

On 13 August 2015 at 10:36, Dan Andreescu dandreescu@wikimedia.org wrote:

...

On Tue, Aug 4, 2015 at 4:27 AM, Oliver Keyes okeyes@wikimedia.org wrote:

...
On 4 August 2015 at 04:24, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Oliver Keyes, 04/08/2015 00:12:

...
a lot less cautious about our sampling rate!

A bit, perhaps, not a lot. Sampling is not just a performance matter.

Could you expand on that?

Not to speak for Nemo, but we don't want reckless abandon just because the system won't break. Thrift is one of our values at the foundation, and if we don't need to scale out with more hardware, we shouldn't. So I think if data collected has value beyond the cost to wrangle it through Kafka / HDFS / dashboards, then it should be collected. If not, it should be sampled until it does. There may not be an easy way to measure this so we'll have to rely on good old subjective consensus. I promise we won't be too strict about it, we'll just kindly ask people to think twice before collecting a lot of data.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Count Logula Wikimedia Foundation

3419

Age (days ago)

3429

Last active (days ago)

analytics@lists.wikimedia.org

7 comments

5 participants

tags (0)

participants (5)

Dan Andreescu
Dario Taraborelli
Federico Leva (Nemo)
Oliver Keyes
Tomasz Finc