Hi Baptiste,
Interesting research endeavor! :)
At a substantive level, this paper does say something about the question you care about, using a hypothetical survey experiment: http://marit.hinnosaar.net/wikipediagender.pdf
Technically, the following paper tries to develop automated ways to infer gender from wp usernames (I don't know of any effort focused in a language other than English, however):
https://wikiworkshop.org/2018/papers/wikiworkshop2018_paper_20.pdf
Jérôme
De : Baptiste Fontaine b@ptistefontaine.fr À : wiki-research-l@lists.wikimedia.org Sujet : [Wiki-research-l] Measuring gender bias in contributors to the French-language Wikipedia Date : 22/05/2020 14:13:58 Europe/Paris
Hello,
Due to a lot of free time these days I started a personal research project on gender bias in contributors to the French-language Wikipedia.
My goal is to explore the relation between contributor genders and the people they create articles about. The hypotheses are: 1- contributors predominantly write biographies of people with the same gender. Simplistically: men write about men; women write about women. 2- there are a lot fewer female contributors than male ones. This has been studied in the past but AFAIK we don’t have recent numbers and they are all on the English-language WP.
If these two hypotheses are true, this could explain part of the problem with gender bias in biographies.
What I’m struggling with –And I guess some people before me did as well on the English-language WP– is the very low level of information we have on contributors’ genders: on WP:FR, 60-70% of contributors have not changed their gender in their user settings.
Does anyone have any pointer on this?
More insights below:
Looking at the contributors with ≥500 edits, 2.4% are auto-declared as female; 27.4% as male; 70.2% as 'unknown' (undeclared).
By definition, there’s no apparent way to know the approximate gender repartition of the undeclared-gender accounts.
The French-language Wikipedia shows male- and unknown-gender user pages with the 'Utilisateur:' prefix while the female-gender user pages use the 'Utilisatrice:' prefix. Based on this, one would assume that women would be more inclined toward declaring their gender so that the interface would stop misgendering them. However, we know that female users tend to under-declare their gender to protect themselves.
I assumed that older accounts would be more inclined toward having a declared gender but that’s not the case: >60% of accounts of all ages (except the very old ones but the sample is very small) have not declared their gender, see:
https://commons.wikimedia.org/wiki/File:Gender_repartition_of_Le_Bistro_WP-f...
Some users have user boxes on their user page with various info. Some of them declare their gender. Surprisingly however, most of the users with these boxes have not declared their gender in their preferences.
Out of the 434 users with a "I’m a woman" user box on their page, only 32% are auto-declared as female. Same ratio for the 2773 "I’m a man" users: only 34% are auto-declared as male. It goes up to 36 % for the "I’m a lesbian" box (N=14) and 40% for the "I’m a gay" one (N=86).
As I expected, predominantly-male professions have a larger male population in their box usage, but still an even larger 'unknown' population: Out of the 640 "I’m an engineer" box users, 24% self-declared as 'male' and 1% as 'female'. For the 714 "I’m a computers person", that’s 27.7% and 0.6%.
However some boxes where I wouldn’t expect a large bias have one as well. The Babel Italian users are 18% male and 2% female (N=2885). The Esperanto ones are 24.5% male and 0.8% female (N=493).
There is certainly a bias in box usage: newer users tend to use them a lot less than older users, and I would assume users who talk about themselves with boxes don’t have the same profile as the average contributor.
Thanks,
-- Baptiste Fontaine
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l