Of interest to some, Jun Rao gave a talk at ApacheCon about Kafka[1] replication[2] (scheduled to land in 0.8 in March). I've pulled out some bits perhaps of interest.

Updated stats about LinkedIn's experience with Kafka:

- Writes: >10B messages/day (>2TB compressed data)

- Reads: >50B messages/day (>1PB compressed data)

- Typical failover time after a broker failure: <10ms

Slides 14, 18-20 talk about its replication model for eventual consistency, interesting as it intentionally makes tradeoffs to take advantage of intra-datacenter latency being an order of magnitude(ish) better than that between DCs connected by the open internet. In exchange for some extra chatter, they tolerate 2f failures among 2f+1 replicas. Clever, and clearly it works for them. (See slides 21-22 for unhelpful diagrams, 27-31 for interesting performance numbers, excepting slide 28's totally inexplicable durability column using highly scientific measures like "some data loss" vs "a few data loss". What.)

Pretty neat stuff, and it's great to see a built-in solution for cross-DC replication.

[1] http://kafka.apache.org/ -- now out of incubator!

[2] http://www.slideshare.net/junrao/kafka-replication-apachecon2013

David Schoonover

dsc@wikimedia.org