Re: [Analytics] [Wikimedia-search] Scaleable Event Systems recap

13 Aug 2015

On Tue, Aug 4, 2015 at 4:27 AM, Oliver Keyes &lt;okeyes(a)wikimedia.org&gt; wrote:

...
  On 4 August 2015 at 04:24, Federico Leva (Nemo)
&lt;nemowiki(a)gmail.com&gt;
 wrote:
  Oliver Keyes, 04/08/2015 00:12:

 a lot less cautious about our sampling
 rate! 

 A bit, perhaps, not a lot. Sampling is not just a performance matter.

 Could you expand on that?

Not to speak for Nemo, but we don't want reckless abandon just because the
system won't break.  Thrift is one of our values at the foundation, and if
we don't need to scale out with more hardware, we shouldn't.  So I think if
data collected has value beyond the cost to wrangle it through Kafka / HDFS
/ dashboards, then it should be collected.  If not, it should be sampled
until it does.  There may not be an easy way to measure this so we'll have
to rely on good old subjective consensus.  I promise we won't be too strict
about it, we'll just kindly ask people to think twice before collecting a
lot of data.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] [Wikimedia-search] Scaleable Event Systems recap