Sage Ross, I think you've missed my point. My point was that the number of editors identifying as female is an entirely different piece of data than the number of females editing Wikipedia, and one should not be used as a surrogate for the other. That is as true for this most recent data as for the data I was cautioning about earlier; it's based on self-identification and should not be taken as an estimate of women editing Wikipedia.
I disagree strongly with the statement "At first glance, it would seem that the gender gap is larger among very active editors." Maybe at a layman's first glance, that's the case, but a statistician glancing at these numbers doesn't see that at all. What I see is the conflation of two different kinds of data. You cannot conclude, even tentatively, from these data whether the numbers relating to editors who self-identify by gender has anything to do with female participation among Wikipedia editors as a whole. As I said before, it's entirely possible, even probable, that editors who take the trouble to self-identify by gender are different in other important ways from those who don't, so it could be very misleading to generalize from one population to the other.
Also, the suggestion even with a caveat, that at first glance these data seem to show that "the gender gap is larger among very active editors" is not a valid suggestion and does not accurately reflect the data. As far as the data can tell us, the explanation that women who know Wikipedia well are less likely to self-identify by gender, is as likely as the explanation that fewer women are likely to be active editors. Which one of these explanations is a more likely reflection of reality simply can't be determined from these data.
By the way, some of the percentages are wrong. The male "percentage of total" column is right for part of the column and then veers off; it appears that from some point on, the percentage was determined by dividing the number of self-identified males within an edit-count category by the number of non-self-identifying editors. For example, the number in the 65535 row identified as 66% (of editors in that edit category identifying as male) should actually be 39%; the number in the 32767 row identified as 52% should actually be 33%, and so forth. Some of the percentages for women are also wrong; the number identified as 4% in the 65535 row should be 2% for example. I didn't have time to go through and calculate every one, but those are some representative inaccurate numbers.
What I see that's interesting in these numbers is something different than others are seeing; while very few females self-identify as female, actually the percentage of more active editors identifying as female is twice the percentage of less active editors identifying as female (1% up to 4,000 edits, 2-3% above that). But these 1-3% of females identifying as female and editing Wikipedia aren't, or shouldn't be, the subjects of interest to this discussion. The more useful question is, what part of the great bulk of Wikipedians who don't self-identify by gender are female? You don't know the answer to that question; you can't estimate the answer to that question using these data that answer a different question. You need more data about female participation, before you charge off generating strategies. You need to know what the problem is before you can develop strategies that have any meaningful chance of solving the problem.
Woonpton
On 2/11/11, Sage Ross sross@wikimedia.org wrote:
We're crossing streams a bit between this list and wikitech-l.
On Fri, Feb 11, 2011 at 10:58 AM, Lars Aronsson lars@aronsson.se wrote on wikitech-l:
One thing that could be interesting is to trace the career of users: When they register, how frequent they edit, if the frequency varies over time, and if these patterns differ between men and women and the gender-anonymous.
User:Dispenser is working on something similar, I think for the next Signpost.
Take a look at this (a work in progress and not mine, so please don't distribute): http://toolserver.org/~dispenser/temp/gender/total_edit_zero_2011-02-10.png
The table at the left traces gender identification rates for editors with less than or equal to the listed number of edits (but more than the previous row). So the first row is editors with 0 edits, the second is editors with 1 edit, the third is editors with 2-3 edits, then 4-7 edits, etc. The last row is everyone with over ~65k edits (and less than 5,000,000). It's based on essentially the 250,000 most recent users who have edited or created an account.
So the takeaways are:
a) the more edits you make, the more likely you are to declare your gender.
b) the ratio of declared females to males falls from about 20% for people who make just zero or one edit, to a stable 5-6% for people who make 1000 or more edits.
Of course, as Woonpton notes, there could be factors that distort that. Maybe women who become active editors are more likely than other women to *not* declare gender. But at first glance, it would seem that the gender gap is larger among very active editors.
-Sage
Gendergap mailing list Gendergap@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/gendergap