If we snappy compress and do more rounding up,
that's 7 TB / week*.
- Kafka. I think we should probably move our primary Kafka brokers to Cisco machines
Cisco UCS
C250 M1
8 x 300G = 2.4T
Except for the fact that there isn't enough space on the Ciscos to hold a week's
worth of logs. We'd want these drives in some kind of redundant setup, so we're
basically getting
Both systems are hard-realtime--they need to run
ahead of the incoming data stream or we'll be forced to drop data. The more RAM, the
more cores, the better; headroom here is important as load will be variable.That's
true, but in Kafka's case I think it isn't quite as important as this is for
ETL/Storm. The Dells have 48 GB RAM, and it is way more important that Kafka is able to
flush its logs out to disk fast, rather than keep them in memory for computational
purposes. We could do some ETL with Kafka if we wanted, but I think we had talked about
leaving all ETL up to Storm.
For the space reasons alone, I think we should use an21 and an22 (Dell R720s) for Kafka.
If we did, that would change the the layout to
an01-an09 (9): ETL Workers
an10 (1): Primary NameNode
an11-an20 (10): Hadoop Workers & DataNodes
an21,an22 (2): Kafka Brokers
an23-an25 (3): Cluster-wide ZooKeepers
an26 (1): Monitoring (JMX & Ganglia)
an27 (1): Secondary NameNode, Hive Metastore (etc), Storm Nimbus
On Nov 13, 2012, at 2:07 PM, David Schoonover <dsc(a)wikimedia.org
(mailto:dsc@wikimedia.org)> wrote:
Awesome job! A couple of comments on the
allocations:
- NameNode
1. I think we definitely want to run the NameNode on a Cisco machine -- NN latency
effects responsiveness across all of HDFS as it's used to address everything.
We're also running the menagerie of Hadoop-related external dependencies on this
machine (e.g., a MySQL for Hive and Oozie, etc). I'd prefer not to tempt fate by
co-locating anything with the primary NameNode.
2. an01 is currently has our cluster's public IP, and thus hosts a bunch of
utilities, making it a DDoS waiting to happen. We should move the NN to an10 (or whatever
other Cisco machine pleases you), as that's way easier than moving the IP. Which
brings up...
4. NameNode Frailty. Generally speaking, I think "NameNode is Hadoop's
SPOF" tends to be a bit overblown. Our cluster isn't on the critical path of any
client-facing operation. It's also not terribly big, so any hardware dedicated to a
hot spare (via AvatarNode (iirc) Facebook's High Availability NameNode or nfs or
whatever) would cut into the jobs we can realistically service on a daily basis. Once
we're sure we have the metal to stay ahead of demand we should revisit this, but I
don't think we're there yet.
4. The Secondary NN is essentially the spare tire in the trunk -- it [sadly] gets no load
so long as the primary is up -- so we can probably run on a R720 alongside the Hadoop
Miscellany and the ETL Nimbus (Storm's JobTracker).
- Kafka. I think we should probably move our primary Kafka brokers to Cisco machines for
the same reason I agree we should run ETL on those machines. Both systems are
hard-realtime--they need to run ahead of the incoming data stream or we'll be forced
to drop data. The more RAM, the more cores, the better; headroom here is important as load
will be variable.
- Monitoring. We plan to run both Ganglia and some kind of application-level JMX
monitoring. Though these services tend to use a decent chunk of network, they're not
otherwise terribly resource hungry. We can probably get away with sticking both on a R720
and otherwise reserve that box for staging and other ops utility work.
- Storage
If we snappy compress and do more rounding up,
that's 7 TB / week*.
Awesome. Even given aggregate datasets from jobs, I think we're probably okay for
space for at least 6 months. That's plenty of time evidence the value of analytics,
and make a strong case for more disks. July is ~8 months away, so the timing lines up
perfectly.
Everything else looks great. I think this leaves the cluster looking like this:
an01-an07 (7): ETL Workers
an08, an09 (2): Kafka Brokers
an10 (1): Primary NameNode
an11-an22 (12): Hadoop Workers & DataNodes
an23-an25 (3): Cluster-wide ZooKeepers
an26 (1): Monitoring (JMX & Ganglia)
an27 (1): Secondary NameNode, Hive Metastore (etc), Storm Nimbus
--
David Schoonover
dsc(a)wikimedia.org (mailto:dsc@wikimedia.org)
On Monday, 12 November 2012 at 9:08 a, Andrew Otto wrote:
> Woooweeeee!
>
> Now that we've got all of our servers up and running, let's take a minute to
assign them all their official roles.
>
> Summary of what we've got:
>
> analytics1001 - analytics1010:
> Cisco UCS C250 M1
> 192G RAM
> 8 x 300G = 2.4T
> 24 core X5650 @ 2.67 GHz
>
> analytics1011 - analytics1022:
> Dell Poweredge R720
> 48G RAM
> 12 * 2T = 24T
> 12 core EW-2620 @ 2.00GHz
>
> analytics1023 - analytics1027:
> Dell PowerEdge R310
> 8G RAM
> 2 * 1G = 2G
> 4 core X3430 @ 2.40GHz
>
> an11 - an22 are easy. They should be Hadoop Worker (HDFS) nodes, since they have so
much storage space!
>
> an23-an27 are relative weakling and should not be used for compute or data needs.
I've currently got Zookeepers running on an23, an24 and an25 (we need 3 for a quorum),
and I think we should keep it that way.
>
> The remaining assignments require more discussion. Our NameNode is currently on
an01, but as Diederik pointed out last week, this is a bit of a waste of a node, since it
is so beefy. I'd like to suggest that we use an26 and an27 for NameNode and backup
NameNode.
>
> My rudimentary Snappy compression test reduces web access log files to about 33% of
there original size. According the the unsampled file we saved back in August,
uncompressed web request logs generate about 100 GB / hour. Rounded (way) up, that's
20 TB / week.
>
If we snappy compress and do more rounding up,
that's 7 TB / week*.
>
>
>
> an11, an12: Kafka Brokers
> I had wanted to use all of the R720s as hadoop workers, but we'd like to be able
to store a week's worth of Kafka log buffer. There isn't enough storage space on
the other machines to do this, so I think we should use two of these as Kafka brokers. If
we RAID 1 the buffer drives (which we probably should), that makes the Kafka buffer 10 TB
(2 nodes * 10 2 TB drives / 2 (for RAID)), which should be enough to cover us for a
while.
>
>
> an01 - an10: Storm/ETL
> These are beefy (tons of RAM, 24 core), so these will be good for hefty realtime
stuff. We could also take a few of these and use them as Hadoop workers, but since they
don't really have that much space to add to the HDFS pool, I'm not sure if it is
worth it.
>
>
> an23, an25: ZooKeepers
> As I said above, let's keep the ZKs here.
>
>
> an26, an27: Hadoop Masters
> Move the NameNodes (primary and secondary/failover) here.
>
>
> an13 - an22: Hadoop Workers
> We need to use the first 2 drives in RAID 1 for the OS, so really we only have 10
drives for HDFS space. Still, that gives us 200 TB. With an HDFS replication factor of 3,
that's 67 TB HDFS.
>
>
>
> Thoughts? Since we'll def want to use an13-an22 as workers, I'll start
spawning those up and adding them to the cluster today. Yeehaw!
>
> -Ao
>
>
>
>
> *For the sake of simplicity, I'm not counting other input sources (event log,
sqoop, etc.), and instead hoping that rounding up as much as I did will cover these
needs.
>
>
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org)
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org)