In case anyone missed this....
This paper outline a new in memory approach to running distributed map/reduce jobs:
http://www.usenix.org/event/osdi10/tech/full_papers/Power.pdf
Definitely some interesting optimizations going on in there (like the use of partitioned
tables) that might be relevant when setting up "big data" infrastructure for
mining WMF data.
Worth a read if you are into distributed computing.
-P-
--
Peter Adams <peter(a)openwebanalytics.com>
Open Web Analytics
http://www.openwebanalytics.com/