In case anyone missed this....
This paper outline a new in memory approach to running distributed map/reduce jobs:
http://www.usenix.org/event/osdi10/tech/full_papers/Power.pdf
Definitely some interesting optimizations going on in there (like the use of partitioned tables) that might be relevant when setting up "big data" infrastructure for mining WMF data.
Worth a read if you are into distributed computing.
-P-
-- Peter Adams peter@openwebanalytics.com Open Web Analytics http://www.openwebanalytics.com/
wiki-research-l@lists.wikimedia.org