I created category trees from the Dec 30 2004 dumps for the largest wikipedias (en/de/ja/fr/pl/nl)
You'll find them at http://eza.gemm.nl/index.html#Categories Beware. Some of the files are huge!
Each category is listed along its shortest path to the top of the hierarchy. Other branches that contain this category link back to there.
Circular references are detected and listed at the bottom of the page. E.g. 'Travel > Tourism > Travel'
Separate trees exist because some categories are not categorized themselves (or were not at Dec 30).
I might generate these trees for all wikipedias as part of the weekly stats job, after the database changes are over (Mediawiki 1.5), which will involve major maintenance on the stats scripts anyway.
Erik Zachte
Erik Zachte wrote:
I created category trees from the Dec 30 2004 dumps for the largest wikipedias (en/de/ja/fr/pl/nl)
Very interesting! I take it the article counts are simple sums of cat and subcats, and don't account for an article appearing in several subcategories?
Stan
wikipedia-l@lists.wikimedia.org