I feel I should clarify here. Most editors do not gender-identify in a public manner on projects. There aren't many who have "This user is female/male" userboxes (in fact, most editors don't have userboxes). They don't use the male/female contributor categories. We cannot be certain how many people choose to use gender-specific userpages on the projects that have male/female user differentiation abilities.

That is completely separate from the editor surveys, individual results of which are non-public. I'm hard pressed to suggest that people are incorrectly identifying their gender there any more than they might do in any other survey process (which typically comes with disclaimers such as "accurate within 1% in 19 out of 20 times").

Laura is proposing the building of a dataset from publicly accessible information, and my comment relates to what information she will be able to derive from the publicly stated genders of the users working in the research topic area.

Risker/Anne