* Time series correlation and anomaly detection: AKA: I want an alert for
that massive memcached bytes_out spike that doesn't also wake me up with
false positives at 2AM.
Related: Abe Stanway gave a talk at BACON 2013 about Etsy's realtime
anomaly detection and correlation tools, Skyline and Oculus, which form the
Kale stack [0][1].
[0]:
http://devslovebacon.com/conferences/bacon-2013/talks/bring-the-noise-conti…
[1]
https://codeascraft.com/2013/06/11/introducing-kale/
On Wed, Nov 5, 2014 at 4:22 PM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:
Awesome -- thanks Ori.
On Wed, Nov 5, 2014 at 12:56 AM, Ori Livneh <ori(a)wikimedia.org> wrote:
Facebook just published this summary of a summit
for database researchers
held at Menlo Park last September. I recommend it. It contains a clear and
concise description of Facebook's data infrastructure, and a description of
the open problems they are thinking about, which is even more interesting.
https://research.facebook.com/blog/1522692927972019/facebook-s-top-open-dat…
To whet your appetite, here are the problems (the summaries mostly my own
paraphrase):
* Mobile: How should the shift toward mobile devices affect Facebook’s
data infrastructure?
* Reducing replication: How can we reduce the number of round trips
between the application and data layers?
* Impact of Caching on Availability (aka "oh no, we just restarted
memcached"): How do we harness the efficiency gains provided by caching
without being brought to our knees by a sudden drop in cache hit rate?
* Sampling at logging time in a distributed environment: How should we
sample log streams if we want to maintain accuracy and flexibility to
answer post-hoc queries?
* Trading storage space and CPU: TL;DR: gzip --best or gzip --fast?
* Reliability of pipelines: Pipelines are less reliable than the sum of
their parts. A pipeline composed of two systems, each 0.999 reliable,
is 0.989 reliable. Much sadness. What to do?
* Globally distributed warehouse: consistency models and synchronization
problems.
* Time series correlation and anomaly detection: AKA: I want an alert for
that massive memcached bytes_out spike that doesn't also wake me up with
false positives at 2AM.
_______________________________________________
Engineering mailing list
Engineering(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/engineering
_______________________________________________
Engineering mailing list
Engineering(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/engineering