Hi Everyone,
I just conducted some new research I though you might be intrigued by.
- It compares the "sex or gender" labels in use by Wikidata today - 13 in total. - The percentage of articles about "female"s by language. - The best are Serbian Wikipedia, or Urdu Wikipedia, depending on the size you count. - The Wiki's that have become most sexist in 2014 - English Wikpedia. - And the Data Richness per sex value. - 6.2 Wikidata Statement per male, 6.0 per female.
See the full blog here, and please ask me questions and suggestions - http://notconfusing.com/sex-ratios-in-wikidata-part-iii/
Max Klein ‽ http://notconfusing.com/
Hi all,
I ran a few quick updates on Max's numbers today. As of 9/6/14:
* WIkidata has ~2080k items marked as people * Of these, ~1893k have a "gender" property (91%)
(Magnus's games are doing an amazing job at filling out these numbers, by the way - http://magnusmanske.de/wordpress/?p=213 )
Very quick and dirty statistics follow - note that since we have 9% undefined, the stats may change a bit as time goes on :-)
* The gender breakdown across all these people is approximately 1603k male, 290k female - 84.7% male and 15.3% female.
* enwiki is 15.5% female; arwiki 14.2%; dewiki 14.9% female; frwiki 15.2%; eswiki 15.9%; jawiki 18.2%; hiwiki 18.7%; zhwiki 20.1%
* It's interesting to note that these numbers mostly seem a point or two better than the numbers Max got a month ago, which probably represents better data-logging rather than change in the underlying content
* There are still very few items with a gender property other than "male" or "female" - perhaps 100-200 overall - but I suspect this number will significantly increase as we deal with the remaining items.
Andrew.
On 22 May 2014 18:16, Maximilian Klein isalix@gmail.com wrote:
Hi Everyone,
I just conducted some new research I though you might be intrigued by.
It compares the "sex or gender" labels in use by Wikidata today - 13 in total. The percentage of articles about "female"s by language.
The best are Serbian Wikipedia, or Urdu Wikipedia, depending on the size you count.
The Wiki's that have become most sexist in 2014 - English Wikpedia. And the Data Richness per sex value. - 6.2 Wikidata Statement per male, 6.0 per female.
See the full blog here, and please ask me questions and suggestions -
http://notconfusing.com/sex-ratios-in-wikidata-part-iii/
Max Klein ‽ http://notconfusing.com/
Gendergap mailing list Gendergap@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/gendergap
On Mon, Jun 9, 2014 at 3:17 PM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Hi all,
I ran a few quick updates on Max's numbers today. As of 9/6/14:
- WIkidata has ~2080k items marked as people
- Of these, ~1893k have a "gender" property (91%)
(Magnus's games are doing an amazing job at filling out these numbers, by the way - http://magnusmanske.de/wordpress/?p=213 )
Very quick and dirty statistics follow - note that since we have 9% undefined, the stats may change a bit as time goes on :-)
- The gender breakdown across all these people is approximately 1603k
male, 290k female - 84.7% male and 15.3% female.
- enwiki is 15.5% female; arwiki 14.2%; dewiki 14.9% female; frwiki
15.2%; eswiki 15.9%; jawiki 18.2%; hiwiki 18.7%; zhwiki 20.1%
- It's interesting to note that these numbers mostly seem a point or
two better than the numbers Max got a month ago, which probably represents better data-logging rather than change in the underlying content
- There are still very few items with a gender property other than
"male" or "female" - perhaps 100-200 overall - but I suspect this number will significantly increase as we deal with the remaining items.
Andrew.
Can you define "item" in this context?
Do we have any comparable data points by which to evaluate our progress? Perhaps a similar breakdown of other reference works, or if there is some sort of summary data available about biographies written (using LOC data?), etc.
On 9 June 2014 20:21, Nathan nawrich@gmail.com wrote:
- WIkidata has ~2080k items marked as people
- Of these, ~1893k have a "gender" property (91%)
Can you define "item" in this context?
"Item" here is a single Wikidata entry:
http://www.wikidata.org/wiki/Q320
which may correspond to one Wikipedia article, one hundred Wikipedia articles, etc - but all on the same topic. (Potentially it may correspond to *no* Wikipedia articles - it's not strictly required, and in any case the source article may be deleted - but there's unlikely to be a statistically large number of these just now)
Do we have any comparable data points by which to evaluate our progress? Perhaps a similar breakdown of other reference works, or if there is some sort of summary data available about biographies written (using LOC data?), etc.
The new Oxford Dictionary of National Biography was about 10% female when published in 2004, though this was skewed by a limitation to include all entries from the original, including a lot of - to modern eyes - very non-notable men. http://oed.hertford.ox.ac.uk/main/images/stories/articles/baigent2005.pdf (It's since crept up to ~11%)
Max has done some numbers based on gender assigned in VIAF entries, I think, but I can't immediately find it. Ben Schmidt did something similar based on first names of authors: http://sappingattention.blogspot.co.uk/2012/05/women-in-libraries.html
Some language versions of Wikipedia do have gender categorization, such as Swedish and German Wikipedia. (The English categories exist but are not used very much.) Here's a link to the Swedish ones:
https://sv.wikipedia.org/wiki/Kategori:M%C3%A4n (men) presently 132 211 articles
https://sv.wikipedia.org/wiki/Kategori:Kvinnor (women) presently 32 693 articles
This gives a rough proportion of 1 female for every 4 male. article subject. If my memory serves me, the German Wikipedia numbers are a bit higher (perhaps 1 in 6).
The categorization was on Swedish Wikipedia a conscious decision to try and find out where we stood.
Best wishes,
Lennart Guldbrandsson
070 - 207 80 05 http://www.elementx.se - arbete http://www.mrchapel.wordpress.com - personlig blogg
Presentation @aliasHannibal - på Twitter
"Tänk dig en värld där varje människa på den här planeten får fri tillgång till världens samlade kunskap. Det är vårt mål."
Jimmy Wales
From: andrew.gray@dunelm.org.uk Date: Mon, 9 Jun 2014 20:44:17 +0100 To: gendergap@lists.wikimedia.org Subject: Re: [Gendergap] Sex Ratios in Wikidata Part III
On 9 June 2014 20:21, Nathan nawrich@gmail.com wrote:
- WIkidata has ~2080k items marked as people
- Of these, ~1893k have a "gender" property (91%)
Can you define "item" in this context?
"Item" here is a single Wikidata entry:
http://www.wikidata.org/wiki/Q320
which may correspond to one Wikipedia article, one hundred Wikipedia articles, etc - but all on the same topic. (Potentially it may correspond to *no* Wikipedia articles - it's not strictly required, and in any case the source article may be deleted - but there's unlikely to be a statistically large number of these just now)
Do we have any comparable data points by which to evaluate our progress? Perhaps a similar breakdown of other reference works, or if there is some sort of summary data available about biographies written (using LOC data?), etc.
The new Oxford Dictionary of National Biography was about 10% female when published in 2004, though this was skewed by a limitation to include all entries from the original, including a lot of - to modern eyes - very non-notable men. http://oed.hertford.ox.ac.uk/main/images/stories/articles/baigent2005.pdf (It's since crept up to ~11%)
Max has done some numbers based on gender assigned in VIAF entries, I think, but I can't immediately find it. Ben Schmidt did something similar based on first names of authors: http://sappingattention.blogspot.co.uk/2012/05/women-in-libraries.html
--
- Andrew Gray andrew.gray@dunelm.org.uk
Gendergap mailing list Gendergap@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/gendergap
On 9 June 2014 23:34, Lennart Guldbrandsson l_guldbrandsson@hotmail.com wrote:
Some language versions of Wikipedia do have gender categorization, such as Swedish and German Wikipedia. (The English categories exist but are not used very much.) Here's a link to the Swedish ones:
https://sv.wikipedia.org/wiki/Kategori:M%C3%A4n (men) presently 132 211 articles
https://sv.wikipedia.org/wiki/Kategori:Kvinnor (women) presently 32 693 articles
This gives a rough proportion of 1 female for every 4 male. article subject. If my memory serves me, the German Wikipedia numbers are a bit higher (perhaps 1 in 6).
The categorization was on Swedish Wikipedia a conscious decision to try and find out where we stood.
Thanks - I knew about the German categories but not the Swedish ones.
Interestingly, Wikidata reports:
32661 female on svwiki: http://tools.wmflabs.org/wikidata-todo/autolist.html?q=claim%5B31%3A5%5D%20a...
130801 male on svwiki: http://tools.wmflabs.org/wikidata-todo/autolist.html?q=claim%5B31%3A5%5D%20a...
Wikidata gives 20% female, the Wikipedia categories give 21%, but they're in reasonably good alignment - almost perfectly matching for women, and about 1500 men not in Wikidata. I'll have a look at getting these mapped across tonight :-)
Lennart Guldbrandsson, 10/06/2014 00:34:
The categorization was on Swedish Wikipedia a conscious decision to try and find out where we stood.
On it.wiki such a hammer is not needed because since 2006 all biographical entries are added with a template. https://tools.wmflabs.org/personabot/?sesso=F&q=1 https://tools.wmflabs.org/personabot/?sesso=M&q=1 36975 vs. 222635.
Nemo