Gender bias in GitHub (but not entirely what you expect)

List overview All Threads
Download

newer

older

Commons Collaborative Economies:...

Re: [Wiki-research-l] Gender bias...

Jonathan Morgan

11 Feb 2016 11 Feb '16

9:30 p.m.

Thought I'd pass this along. Haven't read the whole article yet, but it sounds fascinating. TL;DR: Looks like contributions by women are accepted *more often* than those by men, but *only *if the project leader doesn't know the pull request is coming from a woman. Excellent summary: http://arstechnica.com/information-technology/2016/02/data-analysis-of-gith… Preprint: https://peerj.com/preprints/1733v1/ Note: this work has not yet been peer-reviewed. J -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Peter Ansell

11 Feb 11 Feb

10:06 p.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

One theory may be that outsiders contribute trivial fixes, which are virtually assured to have a 100% acceptance rate by communities that wish to expand. Even if the trivial fix is slightly broken the maintainer can patch it up after the merge and give the contributor a sense of achievement by accepting their changes verbatim. However, I have intentionally declined pull requests on non-trivial topics from regular contributors for various reasons, including when they offer two solutions to an issue and put them up as two separate pull requests for review. I have never heard anyone claim that a 100% pull request acceptance rate is a special thing btw. Once you get deeper into projects you realise that some pull requests were good in theory, but for the greater good they need to be rejected in favour of alternatives. The numbers may be there in this paper, but they don't currently imply causality IMO. Cheers, Peter On 12 February 2016 at 08:30, Jonathan Morgan <jmorgan(a)wikimedia.org> wrote:

...

Thought I'd pass this along. Haven't read the whole article yet, but it sounds fascinating. TL;DR: Looks like contributions by women are accepted more often than those by men, but only if the project leader doesn't know the pull request is coming from a woman. Excellent summary: http://arstechnica.com/information-technology/2016/02/data-analysis-of-gith… Preprint: https://peerj.com/preprints/1733v1/ Note: this work has not yet been peer-reviewed. J -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Laura Hale

10:20 p.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

https://www.quora.com/Has-the-female-participation-on-Quora-changed-in-the-… is not peer reviewed (though if you want my data) but I'm the only person inside the community looking at gender issue on Quora. In the past six months, there has been a noticable shift in female participation type on Quora, to the point where it surpassed that of men. It isn't necessarily translating towards higher female user rates but it is on the participation side. Sincerely, Laura Hale On Thu, Feb 11, 2016 at 10:30 PM, Jonathan Morgan <jmorgan(a)wikimedia.org> wrote:

...

-- twitter: purplepopple

Flöck, Fabian

19 Feb 19 Feb

11:42 a.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

There are several issues with this study, some of which are pointed out here in a useful summary: http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-… . Especially making the gender responsible for the difference in contrast to other attributes of the users that might be just linked to the gender (maybe the women that join GitHub are just the very best/professional women, contribute only to specific types of code, etc., etc.) , apart from some other open questions re:methods, seems questionable for me. And I also share the author’s criticism of “science journalism” and it’s propensity for reporting catchy results. Fabian On 11.02.2016, at 23:20, Laura Hale <laura@fanhistory.com<mailto:laura@fanhistory.com>> wrote: https://www.quora.com/Has-the-female-participation-on-Quora-changed-in-the-… is not peer reviewed (though if you want my data) but I'm the only person inside the community looking at gender issue on Quora. In the past six months, there has been a noticable shift in female participation type on Quora, to the point where it surpassed that of men. It isn't necessarily translating towards higher female user rates but it is on the participation side. Sincerely, Laura Hale On Thu, Feb 11, 2016 at 10:30 PM, Jonathan Morgan <jmorgan@wikimedia.org<mailto:jmorgan@wikimedia.org>> wrote: Thought I'd pass this along. Haven't read the whole article yet, but it sounds fascinating. TL;DR: Looks like contributions by women are accepted more often than those by men, but only if the project leader doesn't know the pull request is coming from a woman. Excellent summary: http://arstechnica.com/information-technology/2016/02/data-analysis-of-gith… Preprint: https://peerj.com/preprints/1733v1/ Note: this work has not yet been peer-reviewed. J -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF)<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- twitter: purplepopple _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l Gruß, Fabian -- Fabian Flöck Research Associate Computational Social Science department @GESIS Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck@gesis.org<mailto:fabian.floeck@gesis.org> www.gesis.org www.facebook.com/gesis.org

Kerry Raymond

11:45 p.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

In IT development, it’s not unusual to find the women to be of a higher standard of ability. They have to be to survive the filters in their profession. It’s not uncommon to see new graduates in ending up in roles based on gender: men into development, women into help desk, tech writing, testing etc. Why? “Girls are good with people” (help desk), “Girls have more attention to detail” (testing) etc. Then, lacking a development role on their CV, it makes it harder for them to get their next job in a development role. You have to be good to survive that filtering. So I can easily believe the average women on GitHub is of a higher standard of ability than the average male. I suspect the same holds true about Wikipedians. Does anyone actually have the 2011 editor survey data to compare male vs female on other questions like age, level of education, etc. It would be interesting to know how the male and female Wikipedians of 2011 are statistically different in other ways. Kerry From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Flöck, Fabian Sent: Friday, 19 February 2016 9:42 PM To: Research into Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org> Subject: Re: [Wiki-research-l] Gender bias in GitHub (but not entirely what you expect) There are several issues with this study, some of which are pointed out here in a useful summary: http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-… . Especially making the gender responsible for the difference in contrast to other attributes of the users that might be just linked to the gender (maybe the women that join GitHub are just the very best/professional women, contribute only to specific types of code, etc., etc.) , apart from some other open questions re:methods, seems questionable for me. And I also share the author’s criticism of “science journalism” and it’s propensity for reporting catchy results. Fabian On 11.02.2016, at 23:20, Laura Hale <laura(a)fanhistory.com <mailto:laura@fanhistory.com> > wrote: https://www.quora.com/Has-the-female-participation-on-Quora-changed-in-the-… is not peer reviewed (though if you want my data) but I'm the only person inside the community looking at gender issue on Quora. In the past six months, there has been a noticable shift in female participation type on Quora, to the point where it surpassed that of men. It isn't necessarily translating towards higher female user rates but it is on the participation side. Sincerely, Laura Hale On Thu, Feb 11, 2016 at 10:30 PM, Jonathan Morgan <jmorgan(a)wikimedia.org <mailto:jmorgan@wikimedia.org> > wrote: Thought I'd pass this along. Haven't read the whole article yet, but it sounds fascinating. TL;DR: Looks like contributions by women are accepted more often than those by men, but only if the project leader doesn't know the pull request is coming from a woman. Excellent summary: http://arstechnica.com/information-technology/2016/02/data-analysis-of-gith… Preprint: https://peerj.com/preprints/1733v1/ Note: this work has not yet been peer-reviewed. J -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- twitter: purplepopple _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l Gruß, Fabian -- Fabian Flöck Research Associate Computational Social Science department @GESIS Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck(a)gesis.org <mailto:fabian.floeck@gesis.org> www.gesis.org <http://www.gesis.org> www.facebook.com/gesis.org <http://www.facebook.com/gesis.org>

WereSpielChequers

20 Feb 20 Feb

9:23 a.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

Hi Kerry, good point. I've often heard the FA crowd say that the featured article process has a higher proportion of women than is normal on Wikipedia. From my own experience there they are probably right. Regards Jonathan

...

On 19 Feb 2016, at 23:45, Kerry Raymond <kerry.raymond(a)gmail.com> wrote: In IT development, it’s not unusual to find the women to be of a higher standard of ability. They have to be to survive the filters in their profession. It’s not uncommon to see new graduates in ending up in roles based on gender: men into development, women into help desk, tech writing, testing etc. Why? “Girls are good with people” (help desk), “Girls have more attention to detail” (testing) etc. Then, lacking a development role on their CV, it makes it harder for them to get their next job in a development role. You have to be good to survive that filtering. So I can easily believe the average women on GitHub is of a higher standard of ability than the average male. I suspect the same holds true about Wikipedians. Does anyone actually have the 2011 editor survey data to compare male vs female on other questions like age, level of education, etc. It would be interesting to know how the male and female Wikipedians of 2011 are statistically different in other ways. Kerry From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Flöck, Fabian Sent: Friday, 19 February 2016 9:42 PM To: Research into Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org> Subject: Re: [Wiki-research-l] Gender bias in GitHub (but not entirely what you expect) There are several issues with this study, some of which are pointed out here in a useful summary: http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-… . Especially making the gender responsible for the difference in contrast to other attributes of the users that might be just linked to the gender (maybe the women that join GitHub are just the very best/professional women, contribute only to specific types of code, etc., etc.) , apart from some other open questions re:methods, seems questionable for me. And I also share the author’s criticism of “science journalism” and it’s propensity for reporting catchy results. Fabian On 11.02.2016, at 23:20, Laura Hale <laura(a)fanhistory.com> wrote: https://www.quora.com/Has-the-female-participation-on-Quora-changed-in-the-… is not peer reviewed (though if you want my data) but I'm the only person inside the community looking at gender issue on Quora. In the past six months, there has been a noticable shift in female participation type on Quora, to the point where it surpassed that of men. It isn't necessarily translating towards higher female user rates but it is on the participation side. Sincerely, Laura Hale On Thu, Feb 11, 2016 at 10:30 PM, Jonathan Morgan <jmorgan(a)wikimedia.org> wrote: Thought I'd pass this along. Haven't read the whole article yet, but it sounds fascinating. TL;DR: Looks like contributions by women are accepted more often than those by men, but only if the project leader doesn't know the pull request is coming from a woman. Excellent summary: http://arstechnica.com/information-technology/2016/02/data-analysis-of-gith… Preprint: https://peerj.com/preprints/1733v1/ Note: this work has not yet been peer-reviewed. J -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- twitter: purplepopple _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l Gruß, Fabian -- Fabian Flöck Research Associate Computational Social Science department @GESIS Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck(a)gesis.org www.gesis.org www.facebook.com/gesis.org _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Samuel Klein

1:44 a.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

The full paper is very much worth reading. Peter writes:

...

One theory may be that outsiders contribute trivial fixes, which are virtually assured to have a 100% acceptance rate by communities that wish to expand.

Did you read the paper? "the changes proposed by women typically included more lines of code than men's, so they weren't just submitting smaller contributions either." On Fri, Feb 19, 2016 at 6:42 AM, Flöck, Fabian <Fabian.Floeck(a)gesis.org> wrote:

...

There are several issues with this study, some of which are pointed out here in a useful summary: http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-… .

Fabian, slatestarcodex is a perennially unreliable source on discussions of gender issues. You cannot take his analysis at face value and have to actually read the paper.

...

Especially making the gender responsible for the difference in contrast to other attributes of the users that might be just linked to the gender (maybe the women that join GitHub are just the very best/professional women, contribute only to specific types of code, etc., etc.)

Peter Ansell

22 Feb 22 Feb

12:34 a.m.

New subject: Gender bias in GitHub (but not entirely what you expect)

On 20 February 2016 at 12:44, Samuel Klein <meta.sj(a)gmail.com> wrote:

...

The full paper is very much worth reading. Peter writes:

One theory may be that outsiders contribute trivial fixes, which are virtually assured to have a 100% acceptance rate by communities that wish to expand.

Did you read the paper? "the changes proposed by women typically included more lines of code than men's, so they weren't just submitting smaller contributions either."

I discounted their conclusion based on their current numbers and statistical analysis methods, making the proposition still valid in the long term and one that could be examined in more detail in the future. In particular, the raw median (50th percentile) numbers are very close and applying normal distribution-based statistical tools to say that 5 is more significant than 4 or 29 is more significant than 24 doesn't provide causal validity for their results. Ie, the raw median numbers were 29/5/1/2 versus 24/4/1/2 for commit size, and indicate that a median contribution in both cases is very small indeed, as expected from experience in Open Source projects. The differences at the mean however may come down to any number of factors which are not distinguishing on software quality or software engineer competence, as are any line-based software metrics that managers over the years have tried to apply to software engineers. All of their statistical measurements to come to that conclusion are based on the mean, which doesn't give an approximation to the degree of the long tail as it assumes a normal distribution by using both the mean and chi-square. FWIW, I never imagined that the mean number of commits would get as high as 29, as that is already a large pull request. With the median at 2, the high mean indicates that somewhere on GitHub there are pull requests that get very large to push the mean up from 2 at the 50th percentile to some large number (>>29) at the 95th percentile and above. Graphing this may not have helped their hypothesis, but it would have been very informative to see which section of the pull request size distribution most of their verifying evidence would have come from. They also didn't allow for the relatively common community preference to rebasing/squashing during pull requests, which heavily affects both the number of lines changed and the number of commits, and will result in an median of around 1 commit with the minimal number of lines changed, if many communities follow it. It isn't as prevalent since most projects stopped using mailing lists to mail patches around and started using Pull Requests for the same feature with multiple commits, but it still exists in some places. However, notably, Linux still requires rebasing/squashing for commits submitted there, so there are a very small number of projects that still require it to be compatible with mailing list ettiquette. Don't get me wrong, I am not arguing that women don't contribute high quality code to open source projects and should not do more of it in future. I am just arguing that the causes that the authors have presumed for many of their metrics either don't match the data collection/analysis methodologies, or the metrics themselves are flawed by their assumption that the numbers they are able to get, for the small part of the massive population they are studying at a high level, are meaningful. It is great that there is publicity that in the 12% of the GitHub community that could be distinguished by gender (1.4 million out of a total of 12 million) there are women who are making valuable contributions. It is also awesome that there are about 103 women who have made more than 128 pull requests across the 31 million GitHub repositories. They have chosen to avoid identifying any of these 103 women, who have both public GitHub and public Google+ profiles, while they have published the methodology for finding them which is a little strange, but it is great that they exist on projects somewhere. To get a larger sample size to give their measures some context, it would have been appropriate for them to include statistics on both the total number of men/women/undertermined gender committing directly to their own repositories and committing to shared repositories which they are a member of, where they may not use pull requests and just rely on Travis/Jenkins ongoing evaluation to tell them whether they are still up to date. Ie, not everyone uses pull requests and it seems unfair to isolate those actions from them committing directly. It may destroy the basis for their article due to the much larger population that commit directly to repositories, and hence probably why they didn't refer to it, but it would be interesting for someone to do that study. There are also good reasons for putting this research up for pre-publishing peer review, as they can solicit opinions about where they haven't included the ideal metrics and where they could broaded the scope to include those that they are ignoring by focusing on pull requests and their assumptions about Open/Closed/Merged states. Science is nothing if not reviewed by the community, and the current vogue of 2-3 closed reviews before publication, to enable journal impact factors and large publication counts to be used to pay academics more efficiently, before publication doesn't help with that so it is great that they put their research up for public review before publication and hopefully they respond with changes before publication per the usual pre-print iterative style. If you want to discuss this purely as a social issue and not a software engineering discussion, then feel free to use the numbers without questioning the causality assumptions from a software engineers point of view. I have just had too much experience of GitHub/Open Source to take any numbers, including raw number of commits or size, too literally and to use them in individual evaluations, much less population wide statistical evaluations. I have had many difficult to agree on pull requests in my experience, which change very small numbers of lines of code, where much larger pull requests, where the user has already used the standard code formatting tools to reformat their code are accepted straight away based on Travis/Jenkins results. By example, they have a very novel idea for the small difference in the number of pull requests also linked to an issue as having a social cause. I have been on many projects that do not make it a typical open issues before opening a pull request if they have code available already, because pull requests include discussion as a core feature and having another discussion location fragments the discussion, or they have non-GitHub issue trackers (Atlassian Jira Cloud/OnDemand for example are fairly popular) which won't be picked up by their methodology. Many projects accept that it is easiest to simply open a pull request and have the discussion there. If the discussion moves away from acceptance, then the pull requester simply closes the pull request themselves in favour of another separate pull request, still without an issue. Both of those situations will cause significant noise in their causality assumption, and make the numbers meaningless if they don't measure them. As the authors are surely aware, as they come from computer science departments, but there are many software engineers who have a mantra not to get attached to code, because in a lot of cases it is simpler (particularly using Git/etc.) when you hit a wall to commit what you have done so far to a branch (or close a pull request) and open a new branch to test out a new idea from scratch which may mean closing the original pull request. That may be a basis for the distribution for males who go down from 80% to 100% rather than continuously approaching 100% pull request acceptance as for women who may not have the same population density to show the same symptoms of being okay with an 80% pull request acceptance without attaching any social implication to it as the authors have done as the basis for their article. The idea that it is better to soldier on with an approach to aim for 100% pull request acceptance is not time effective in the long term from my experience, although I know coders who will keep making changes without giving up on a branch until they come up with a solution (and may contribute to a high mean number of commits per pull request due to their low numbers in the community) In addition, anyone familiar with CodeGolf would be skeptical of the perception that larger pull requests correlates directly with code quality so the authors are jumping a large chasm by implying that larger code changes are directly related to more important features. It may be the case that there are many large code contributions that are related to important features, I am just skeptical that you can examine them in a population without taking into account the context for each pull request. It would be great if they released some more aggregate data in machine readable form to further continue their analysis and hopefully they follow the PeerJ guidelines to make their data and code available for review without delay so others can further explore parts of the dataset that didn't make it into the publication. If they can't release raw data, the standard deviation, the 95the percentile and the maximum for each case on the number of commits would be very informative and it would have been appropriate for them to determine whether there are a few very large pull requests at the high end (ie, statistical outliers) which are skewing their assumed distribution and hence their results. https://peerj.com/about/preprints/scope-and-instructions/#data-and-materials Cheers, Peter

...

On Fri, Feb 19, 2016 at 6:42 AM, Flöck, Fabian <Fabian.Floeck(a)gesis.org> wrote:

There are several issues with this study, some of which are pointed out here in a useful summary: http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-… .

Fabian, slatestarcodex is a perennially unreliable source on discussions of gender issues. You cannot take his analysis at face value and have to actually read the paper.

The authors discuss this at length. The result observed held true across all languages and many types of code. And their conclusions are indeed guesses about what shared attributes might lead to the strong statistical observation. "Given that there is no "computer science gene" that occurs more often in women than in men, there has to be a social bias at work." _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

3020

days inactive

3031

days old

wiki-research-l@lists.wikimedia.org

Manage subscription

7 comments

7 participants

tags (0)

participants (7)

Flöck, Fabian
Jonathan Morgan
Kerry Raymond
Laura Hale
Peter Ansell
Samuel Klein
WereSpielChequers