Daniel Schwen wrote:
I find it a little frustrating that this wheel gets
reinvented
so often. My tool was used a couple of times after I posted it,
and now as maybe one user per day (from a quick glance at the
Users of the Swedish Wikipedia are increasingly starting to use
Duesentrieb's CatScan tool. It is really useful, but could need
some further improvement, especially in the handling of large
categories.
So we have shown multiple times now that cat
intersection is
technically feasible. What we nee now is massive lobbying for
atomic categorisation. THAT is the hurdle right now IMO. Not
some SQL queries.
After a lengthy discussion (over many years) about category:tennis
players and category:female tennis players in the Swedish
Wikipedia, I created in late August 2008 the category:men and
category:women, so that all profession categories could be freed
from the burden of also documenting the gender. The Swedish
Wikipedia still has a category:Danish tennis players (combining
profession and nationality), just like the English Wikipedia, but
gender is now documented separately, as in the German Wikipedia.
All three languages have a category:1942 births. I think no
language of Wikipedia has a combined category for tennis players
born in 1942. So the question of atomic categories is not an
absolute. It is more or less implemented everywhere. For finding
tennis players born in 1942, even the English Wikipedia needs to
do cross sectioning of categories.
Radically changing the categorization system is not realistic.
It was a huge effort already to introduce men/women in the Swedish
Wikipedia, even though this was just adding categories (not
removing any), and even though Swedish is not among the largest 10
Wikipedias. Within 3 months (September-November), some 75,000
articles were categorized, of which 15,000 women and 60,000 men.
The ratio 1:4 (1 woman for every 4 men) is far more equal than the
1:6 ratio of the German Wikipedia.
What I discovered then was that of these 75,000 biographies, only
60,000 were categorized according to year of birth. So we now
have to birth categorize 15,000 articles before we can compile
reliable statistics on how the gender imbalance shifts over time.
Early estimates show that there is a 1:10 gender ratio in the 18th
century and a 1:3 ratio for those born in the 1970s.
So the larger imbalance (1:6) of the German Wikipedia might be
explained by having a larger amount of 18th century biographies.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se