Daniel Schwen wrote:
I find it a little frustrating that this wheel gets reinvented so often. My tool was used a couple of times after I posted it, and now as maybe one user per day (from a quick glance at the
Users of the Swedish Wikipedia are increasingly starting to use Duesentrieb's CatScan tool. It is really useful, but could need some further improvement, especially in the handling of large categories.
So we have shown multiple times now that cat intersection is technically feasible. What we nee now is massive lobbying for atomic categorisation. THAT is the hurdle right now IMO. Not some SQL queries.
After a lengthy discussion (over many years) about category:tennis players and category:female tennis players in the Swedish Wikipedia, I created in late August 2008 the category:men and category:women, so that all profession categories could be freed from the burden of also documenting the gender. The Swedish Wikipedia still has a category:Danish tennis players (combining profession and nationality), just like the English Wikipedia, but gender is now documented separately, as in the German Wikipedia.
All three languages have a category:1942 births. I think no language of Wikipedia has a combined category for tennis players born in 1942. So the question of atomic categories is not an absolute. It is more or less implemented everywhere. For finding tennis players born in 1942, even the English Wikipedia needs to do cross sectioning of categories.
Radically changing the categorization system is not realistic. It was a huge effort already to introduce men/women in the Swedish Wikipedia, even though this was just adding categories (not removing any), and even though Swedish is not among the largest 10 Wikipedias. Within 3 months (September-November), some 75,000 articles were categorized, of which 15,000 women and 60,000 men. The ratio 1:4 (1 woman for every 4 men) is far more equal than the 1:6 ratio of the German Wikipedia.
What I discovered then was that of these 75,000 biographies, only 60,000 were categorized according to year of birth. So we now have to birth categorize 15,000 articles before we can compile reliable statistics on how the gender imbalance shifts over time. Early estimates show that there is a 1:10 gender ratio in the 18th century and a 1:3 ratio for those born in the 1970s.
So the larger imbalance (1:6) of the German Wikipedia might be explained by having a larger amount of 18th century biographies.