Alright!  We've got a 10 node CDH3 hadoop cluster set up.  I am experimenting with (and learning about!) hadoop as we go.  We plan on doing some benchmarking of CDH3 vs. DataStax Enterprise (and vs. CDH4?) on this cluster before we make decisions.  Right now is playtime!

I just added some notes to this Etherpad on some variable tweaking I will be doing.  My new notes start at about line 187.  (Can I link to a specific line in Etherpad?)  I've also created a google spreadsheet where I am keeping track of my benchmarking runs.  Let me know if need  access to it.  https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0AvpRkIqSY9hNdEtRLVNoQWNvQzNleHBtTXR5emI3Z2c&pli=1#gid=0

If anyone on this list (who's on this list, anyway?!)  has some insight or experience with hadoop benchmarking, feel free to chime in.  We'd love the help!

Thanks all,
-Andrew Otto