Over the last week I've created hive tables for many of our larger datasets
in Hadoop. Those were used to generate many of the results you've seen in
the last few days.
Both the schemas for those tables and the job-scripts can be found in:
-
https://github.com/wikimedia/kraken/tree/master/hive
Questions welcome.
--
David Schoonover
dsc(a)wikimedia.org