Hi all,
I'd like to share a little Wikidata application: I just used Wikidata to guess the sex of people based on their (first) name [1]. My goal was to determine gender bias among the authors in several research areas. This is how some people spend their free time on weekends ;-)
In the process, I also created a long list of first names with associated sex information from Wikidata [2]. It is not super clean but it served its purpose. If you are a researcher, then maybe the gender bias of journals/conferences is interesting to you as well. Details and some discussion of the results are online [1].
Cheers,
Markus
[1] http://korrekt.org/page/Note:Sex_Distributions_in_Research [2] https://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNmJ...
If you need to push through automated sexing for items without sex property, point to my similar attempt in June: https://www.wikidata.org/wiki/Wikidata:Bot_requests#Set_sex:male_for_item_li...
On Sun, Oct 13, 2013 at 11:16 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi all,
I'd like to share a little Wikidata application: I just used Wikidata to guess the sex of people based on their (first) name [1]. My goal was to determine gender bias among the authors in several research areas. This is how some people spend their free time on weekends ;-)
In the process, I also created a long list of first names with associated sex information from Wikidata [2]. It is not super clean but it served its purpose. If you are a researcher, then maybe the gender bias of journals/conferences is interesting to you as well. Details and some discussion of the results are online [1].
Cheers,
Markus
[1] http://korrekt.org/page/Note:**Sex_Distributions_in_Researchhttp://korrekt.org/page/Note:Sex_Distributions_in_Research [2] https://docs.google.com/**spreadsheet/ccc?key=0AstQ5xfO-** xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc**0cnc&usp=sharinghttps://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc0cnc&usp=sharing
______________________________**_________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 13/10/13 23:21, Magnus Manske wrote:
If you need to push through automated sexing for items without sex property, point to my similar attempt in June: https://www.wikidata.org/wiki/Wikidata:Bot_requests#Set_sex:male_for_item_li...
Thanks, the list I got from the items with sex is already longer than I need. My main problem is sexing Asian authors. Not sure if name-based approaches are promising there at all.
Markus
On Sun, Oct 13, 2013 at 11:16 PM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
Hi all, I'd like to share a little Wikidata application: I just used Wikidata to guess the sex of people based on their (first) name [1]. My goal was to determine gender bias among the authors in several research areas. This is how some people spend their free time on weekends ;-) In the process, I also created a long list of first names with associated sex information from Wikidata [2]. It is not super clean but it served its purpose. If you are a researcher, then maybe the gender bias of journals/conferences is interesting to you as well. Details and some discussion of the results are online [1]. Cheers, Markus [1] http://korrekt.org/page/Note:__Sex_Distributions_in_Research <http://korrekt.org/page/Note:Sex_Distributions_in_Research> [2] https://docs.google.com/__spreadsheet/ccc?key=0AstQ5xfO-__xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc__0cnc&usp=sharing <https://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc0cnc&usp=sharing> _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
-- undefined
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
* Markus Krötzsch wrote:
I'd like to share a little Wikidata application: I just used Wikidata to guess the sex of people based on their (first) name [1]. My goal was to determine gender bias among the authors in several research areas. This is how some people spend their free time on weekends ;-)
My http://lists.w3.org/Archives/Public/www-archive/2011Sep/0007.html has related information gleaned from the German Wikipedia two years ago, in- cluding a description of the methodology (which is easily repeatable).
Naming patterns change over time and geography. If you're interested in the gender of current day authors, you should probably constrain your name sampling to the same timeframe.
There's an app that works of the Freebase data here: http://namegender.freebaseapps.com/
It also has an API that returns JSON: http://namegender.freebaseapps.com/gender_api?name=andrea
Based on the top name stats, it looks like its sample is a little more than twice the size of Wikidata's.
Tom
On Sun, Oct 13, 2013 at 6:16 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi all,
I'd like to share a little Wikidata application: I just used Wikidata to guess the sex of people based on their (first) name [1]. My goal was to determine gender bias among the authors in several research areas. This is how some people spend their free time on weekends ;-)
In the process, I also created a long list of first names with associated sex information from Wikidata [2]. It is not super clean but it served its purpose. If you are a researcher, then maybe the gender bias of journals/conferences is interesting to you as well. Details and some discussion of the results are online [1].
Cheers,
Markus
[1] http://korrekt.org/page/Note:**Sex_Distributions_in_Researchhttp://korrekt.org/page/Note:Sex_Distributions_in_Research [2] https://docs.google.com/**spreadsheet/ccc?key=0AstQ5xfO-** xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc**0cnc&usp=sharinghttps://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc0cnc&usp=sharing
______________________________**_________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 14/10/13 18:18, Tom Morris wrote:
Naming patterns change over time and geography. If you're interested in the gender of current day authors, you should probably constrain your name sampling to the same timeframe.
I think geography has a much bigger impact than time here. Unfortunately, the names I try to find the sex for do not come with an obvious hint on their geographic origin, so I cannot really use this. I think filtering by time will not have a big impact, since most people on Wikipedia are from the 20th century anyway. So there should be a natural tendency to overrule older uses of names.
There's an app that works of the Freebase data here: http://namegender.freebaseapps.com/
It also has an API that returns JSON: http://namegender.freebaseapps.com/gender_api?name=andrea
Based on the top name stats, it looks like its sample is a little more than twice the size of Wikidata's.
Nice. Christian Thiele also pointed me to a beautiful web service based on Wikipedia Personendaten (German language, but many things are easy to figure out, I guess):
http://toolserver.org/~apper/pd/vorname/top http://toolserver.org/~apper/pd/vorname/Maria
This illustrates nicely how to take the effect of time into account.
Markus
On Sun, Oct 13, 2013 at 6:16 PM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
Hi all, I'd like to share a little Wikidata application: I just used Wikidata to guess the sex of people based on their (first) name [1]. My goal was to determine gender bias among the authors in several research areas. This is how some people spend their free time on weekends ;-) In the process, I also created a long list of first names with associated sex information from Wikidata [2]. It is not super clean but it served its purpose. If you are a researcher, then maybe the gender bias of journals/conferences is interesting to you as well. Details and some discussion of the results are online [1]. Cheers, Markus [1] http://korrekt.org/page/Note:__Sex_Distributions_in_Research <http://korrekt.org/page/Note:Sex_Distributions_in_Research> [2] https://docs.google.com/__spreadsheet/ccc?key=0AstQ5xfO-__xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc__0cnc&usp=sharing <https://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc0cnc&usp=sharing> _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l