Facebook just published this summary of a summit for database researchers held at Menlo Park last September. I recommend it. It contains a clear and concise description of Facebook's data infrastructure, and a description of the open problems they are thinking about, which is even more interesting.To whet your appetite, here are the problems (the summaries mostly my own paraphrase):* Mobile: How should the shift toward mobile devices affect Facebook’s data infrastructure?* Reducing replication: How can we reduce the number of round trips between the application and data layers?* Impact of Caching on Availability (aka "oh no, we just restarted memcached"): How do we harness the efficiency gains provided by caching without being brought to our knees by a sudden drop in cache hit rate?* Sampling at logging time in a distributed environment: How should we sample log streams if we want to maintain accuracy and flexibility to answer post-hoc queries?* Trading storage space and CPU: TL;DR: gzip --best or gzip --fast?* Reliability of pipelines: Pipelines are less reliable than the sum of their parts. A pipeline composed of two systems, each 0.999 reliable, is 0.989 reliable. Much sadness. What to do?* Globally distributed warehouse: consistency models and synchronization problems.* Time series correlation and anomaly detection: AKA: I want an alert for that massive memcached bytes_out spike that doesn't also wake me up with false positives at 2AM.
_______________________________________________
Engineering mailing list
Engineering@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/engineering