ApacheCon 2013 talk on Kafka Replication - Analytics

28 Feb 2013

Of interest to some, Jun Rao gave a talk at ApacheCon about Kafka[1]
replication[2] (scheduled to land in 0.8 in March). I've pulled out
some bits perhaps of interest.

Updated stats about LinkedIn's experience with Kafka:
- Writes: >10B messages/day (>2TB compressed data)
- Reads: >50B messages/day (>1PB compressed data)
- Typical failover time after a broker failure: <10ms

Slides 14, 18-20 talk about its replication model for eventual consistency,
interesting as it intentionally makes tradeoffs to take advantage of
intra-datacenter latency being an order of magnitude(ish) better than that
between DCs connected by the open internet. In exchange for some extra
chatter, they tolerate 2f failures among 2f+1 replicas. Clever, and clearly
it works for them. (See slides 21-22 for unhelpful diagrams, 27-31 for
interesting performance numbers, excepting slide 28's totally inexplicable
durability column using highly scientific measures like "some data loss" vs
"a few data loss". What.)

Pretty neat stuff, and it's great to see a built-in solution for cross-DC
replication.

[1] http://kafka.apache.org/ -- now out of incubator!
[2] http://www.slideshare.net/junrao/kafka-replication-apachecon2013

--
David Schoonover
dsc(a)wikimedia.org