an11, an12: Kafka Brokers an13 - an22: Hadoop Workers
Dunno why I suggested to do an11 and an12 as Kafkas, and not an21 and and22. So um, I'm using an11-an20 as Hadoop Workers now.
Amendment!
an21, and22: Kafka Brokers
an11 - an20: Hadoop Workers (DONE!)
On Nov 12, 2012, at 12:08 PM, Andrew Otto otto@wikimedia.org wrote:
Woooweeeee!
Now that we've got all of our servers up and running, let's take a minute to assign them all their official roles.
Summary of what we've got:
analytics1001 - analytics1010: Cisco UCS C250 M1 192G RAM 8 x 300G = 2.4T 24 core X5650 @ 2.67 GHz
analytics1011 - analytics1022: Dell Poweredge R720 48G RAM 12 * 2T = 24T 12 core EW-2620 @ 2.00GHz
analytics1023 - analytics1027: Dell PowerEdge R310 8G RAM 2 * 1G = 2G 4 core X3430 @ 2.40GHz
an11 - an22 are easy. They should be Hadoop Worker (HDFS) nodes, since they have so much storage space!
an23-an27 are relative weakling and should not be used for compute or data needs. I've currently got Zookeepers running on an23, an24 and an25 (we need 3 for a quorum), and I think we should keep it that way.
The remaining assignments require more discussion. Our NameNode is currently on an01, but as Diederik pointed out last week, this is a bit of a waste of a node, since it is so beefy. I'd like to suggest that we use an26 and an27 for NameNode and backup NameNode.
My rudimentary Snappy compression test reduces web access log files to about 33% of there original size. According the the unsampled file we saved back in August, uncompressed web request logs generate about 100 GB / hour. Rounded (way) up, that's 20 TB / week.
If we snappy compress and do more rounding up, that's 7 TB / week*.
an11, an12: Kafka Brokers I had wanted to use all of the R720s as hadoop workers, but we'd like to be able to store a week's worth of Kafka log buffer. There isn't enough storage space on the other machines to do this, so I think we should use two of these as Kafka brokers. If we RAID 1 the buffer drives (which we probably should), that makes the Kafka buffer 10 TB (2 nodes * 10 2 TB drives / 2 (for RAID)), which should be enough to cover us for a while.
an01 - an10: Storm/ETL These are beefy (tons of RAM, 24 core), so these will be good for hefty realtime stuff. We could also take a few of these and use them as Hadoop workers, but since they don't really have that much space to add to the HDFS pool, I'm not sure if it is worth it.
an23, an25: ZooKeepers As I said above, let's keep the ZKs here.
an26, an27: Hadoop Masters Move the NameNodes (primary and secondary/failover) here.
an13 - an22: Hadoop Workers We need to use the first 2 drives in RAID 1 for the OS, so really we only have 10 drives for HDFS space. Still, that gives us 200 TB. With an HDFS replication factor of 3, that's 67 TB HDFS.
Thoughts? Since we'll def want to use an13-an22 as workers, I'll start spawning those up and adding them to the cluster today. Yeehaw!
-Ao
*For the sake of simplicity, I'm not counting other input sources (event log, sqoop, etc.), and instead hoping that rounding up as much as I did will cover these needs.