Steve Bennett wrote:
So what *do* our subcategories mean?
Take one more step back and ask: What do our categories mean?
In the paper "Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors", http://arxiv.org/abs/cs.IR/0512085 from December 2005, the authors make an analysis of the English Wikipedia's categories, but the largest categories (table 2, page 10) are hardly "semantic" subject headings, but a technical tool for identifying various quality problems:
1. Disambiguation 40,062 articles 2. 1911_Britannica 10,450 3. Film_stubs 4,867 4. Musician_stubs 4,575 5. American_actors 4,551 6. American_people_stubs 4,401 7. Film_actors 4,023 8. Musical_group_stubs 3,873 9. Album_stubs 3,859 10. Television_actors 3,207 11. Politician_stubs 3,021 12. Articles_to_be_merged 2,899 13. British_people_stubs 2,706 14. American_politician_stubs 2,694 15. Television_stubs 2,540 16. American_actor_stubs 2,492 17. Cleanup_from_October_2005 2,466 18. Incomplete_lists 2,362 19. Football_(soccer)_biography_stubs 2,298 20. Medicine_stubs 2,297
Today's http://en.wikipedia.org/wiki/Special:Mostlinkedcategories isn't much different.
Perhaps it does make some sense to study which users contributed most to the category "cleanup from October 2005", but rather for finding dilettantes than for finding experts on a subject.
When the authors write that "category assignments were introduced in May 2004 and 78,977 unique categories have come into existence since then" I cannot help but laugh and remember Larry Sanger's initial reaction when this was discussed in January 2002, http://mail.wikipedia.org/pipermail/wikipedia-l/2002-January/018685.html
(Here, "May 2004" refers to new functionality in MediaWiki 1.3.)
Today "categories" is a technical tool that can be used for many purposes. Organizing and structuring the contents of Wikipedia according to subject headings is only one of them. Compare this to another technical tool, "templates", which can be used for cleanup messages, for navigation, for infoboxes, for userboxes, for external link syntax, etc.
The pages describing these features in the Help: and Wikipedia: namespaces are confusing these issues all the time. I think we should aim to separate the issues 1. what do you want to accomplish, and 2. what technical tools are available at this time. For example, if I want to create an infobox, I could use raw HTML tables, wikitext table syntax, templates with numeric parameters or templates with named parameters. I could also use a robot to put these infoboxes in a number of articles. Next year, another version of MediaWiki will provide more tools, but my purpose is still to create an infobox.