Peter Jacobi schreef:
Do you see any chance of getting a similiar graph for
the precentage
of articles regarding fictional (persons, places, spaceships, ...
everything)?
I've done some work of this recently, resulting in
http://commons.wikimedia.org/wiki/Image:Size_of_English_Wikipedia_August_20…
.
That image is in percentages of text volume (in bytes), but I also have
the percentages of article numbers. Unfortunately no time series. For
people, the numbers are close to those of Gregory: 10.8% living, 8.9%
dead.
I've identified 7.2% of articles as a location; this is probably an
underestimate. 4.2% is disambiguation; 3.4% albums and singles; 3.0%
tree-of-life articles; 1.6% movies. Over 60% unclassified stuff.
Suggestions for more categories *and how to recognize them* are welcome.
Technical details: these numbers are the percentages of non-redirect
articles in the main namespace of articles matching one of the following
[[regex]]en:
- /\[\[[Cc]ategory:[Ll]iving people(\||\]\])/
- /\[\[[Cc]ategory:[^]]+ (births|deaths)(\||\]\])/
- /{{\s*[Cc]oor/ <-- This one is of very dubious quality
- /{{[dD]isamb/
- /\[\[[Cc]ategory:\d+ (albums|singles)(\||\]\])/
- /{{\s*[Tt]axobox\b/
- /\[\[[Cc]ategory:[^]]+ films(\||\]\])/
Eugene