On Mon, Jun 9, 2014 at 3:17 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Hi all,
I ran a few quick updates on Max's numbers today. As of 9/6/14:
- WIkidata has ~2080k items marked as people
- Of these, ~1893k have a "gender" property (91%)
(Magnus's games are doing an amazing job at filling out these numbers, by the way - http://magnusmanske.de/wordpress/?p=213 )
Very quick and dirty statistics follow - note that since we have 9% undefined, the stats may change a bit as time goes on :-)
- The gender breakdown across all these people is approximately 1603k
male, 290k female - 84.7% male and 15.3% female.
- enwiki is 15.5% female; arwiki 14.2%; dewiki 14.9% female; frwiki
15.2%; eswiki 15.9%; jawiki 18.2%; hiwiki 18.7%; zhwiki 20.1%
- It's interesting to note that these numbers mostly seem a point or
two better than the numbers Max got a month ago, which probably represents better data-logging rather than change in the underlying content
- There are still very few items with a gender property other than
"male" or "female" - perhaps 100-200 overall - but I suspect this number will significantly increase as we deal with the remaining items.
Andrew.
Can you define "item" in this context?
Do we have any comparable data points by which to evaluate our progress? Perhaps a similar breakdown of other reference works, or if there is some sort of summary data available about biographies written (using LOC data?), etc.