In the early years 2005-2007, lots of small articles were created in the Swedish Wikipedia, assuming that they would all grow with time (eventualism [1]) and the important thing was to have many articles. Some articles were marked as stubs or substubs, in a random and entirely subjective way. This changed in 2008, when we started to count the length of articles, to identify the shortest ones, and started to systematically extend or merge them into better articles. It's now much easier to identify a new article as being far too short, and to treat this as an urgent problem.
Now, I'm feeling we're having the same problem with categories. Some people create lots of subcategories for ever more specialized subdivisions of a topic. Especially, "companies established in 1974" gets divided into subcategories for different businesses [2]. Many of the new categories have only very few members.
But we have no principles for understanding whether this really is a problem and no tools to get an overview. We have no way to assess the harm of subdividing a category. If it was only me and my feeling, I would tell them to stop doing this. But this is not enough for a consensus decision.
What tools are there to count the number of categories, sort them by the number of members (subcategories and articles), and to determine which branches of the category tree should be pruned?
As for "law firms established in 1974", one can't remove that single category without reconsidering the entire "law firms by year of establishment" structure. So a tool should consider at least 2 levels.
Which languages of Wikipedia have best organized category trees? I know the German Wikipedia is different. It is often hard to make interwiki links to its categories, but most of the rest seem to build on quite similar ideas.
[1] http://meta.wikimedia.org/wiki/Eventualism [2] http://sv.wikipedia.org/wiki/Kategori:F%C3%B6retag_bildade_1974
2011/1/5 Lars Aronsson lars@aronsson.se
Which languages of Wikipedia have best organized category trees? I know the German Wikipedia is different. It is often hard to make interwiki links to its categories, but most of the rest seem to build on quite similar ideas.
I'm fashinated about categorization, but IMHO the trouble can't be solved with an efficient, easy tool for category intersection, If such a tool (something like optimized DynamicPageList ) would exist, the problem could be solved with an "axial" categorization, t.i. some "trees" of categories dealing with only one "topic", where well defined sets of items, selected for many "axes", could be retrieved by intersection. I call those "axial categories" as "catwords", ti an hybrid between categories and keywords.
See (if you can understand it)... here: [[File:Catwords graph.png]] on Commons. Jus a layman, DIY thoght... don't expect a high level work.
But, to run, this approach needs a rule: an item MUST be assigned to anyone of "parent" category in the tree. Just the opposite of current rule: assign any item to the most specific one.
Alex
On Wed, Jan 5, 2011 at 4:21 PM, Lars Aronsson lars@aronsson.se wrote:
What tools are there to count the number of categories, sort them by the number of members (subcategories and articles), and to determine which branches of the category tree should be pruned?
With the new category table in 1.16 it should be relatively easy to create a special page SmallCategories, just like ShortPages.
Bryan
wikitech-l@lists.wikimedia.org