Alright! We've got a 10 node CDH3 hadoop cluster set up. I am experimenting with
(and learning about!) hadoop as we go. We plan on doing some benchmarking of CDH3 vs.
DataStax Enterprise (and vs. CDH4?) on this cluster before we make decisions. Right now
is playtime!
I just added some notes to this Etherpad on some variable tweaking I will be doing. My
new notes start at about line 187. (Can I link to a specific line in Etherpad?) I've
also created a google spreadsheet where I am keeping track of my benchmarking runs. Let
me know if need access to it.
https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0AvpRkIqSY9hNdE…
If anyone on this list (who's on this list, anyway?!) has some insight or experience
with hadoop benchmarking, feel free to chime in. We'd love the help!
Thanks all,
-Andrew Otto