Re: [Wiki-research-l] Gender bias in GitHub (but not entirely what you expect)

21 Feb 2016


      On 20 February 2016 at 12:44, Samuel Klein meta.sj@gmail.com wrote:
...
The full paper is very much worth reading.
Peter writes:
...
One theory may be that outsiders contribute trivial fixes, which are
virtually assured to have a 100% acceptance rate by communities that
wish to expand.
Did you read the paper?
"the changes proposed by women typically included more lines of code than
men's, so they weren't just submitting smaller contributions either."
I discounted their conclusion based on their current numbers and
statistical analysis methods, making the proposition still valid in
the long term and one that could be examined in more detail in the
future.
In particular, the raw median (50th percentile) numbers are very close
and applying normal distribution-based statistical tools to say that 5
is more significant than 4 or 29 is more significant than 24 doesn't
provide causal validity for their results. Ie, the raw median numbers
were 29/5/1/2 versus 24/4/1/2 for commit size, and indicate that a
median contribution in both cases is very small indeed, as expected
from experience in Open Source projects. The differences at the mean
however may come down to any number of factors which are not
distinguishing on software quality or software engineer competence, as
are any line-based software metrics that managers over the years have
tried to apply to software engineers. All of their statistical
measurements to come to that conclusion are based on the mean, which
doesn't give an approximation to the degree of the long tail as it
assumes a normal distribution by using both the mean and chi-square.
FWIW, I never imagined that the mean number of commits would get as
high as 29, as that is already a large pull request. With the median
at 2, the high mean indicates that somewhere on GitHub there are pull
requests that get very large to push the mean up from 2 at the 50th
percentile to some large number (>>29) at the 95th percentile and
above. Graphing this may not have helped their hypothesis, but it
would have been very informative to see which section of the pull
request size distribution most of their verifying evidence would have
come from.
They also didn't allow for the relatively common community preference
to rebasing/squashing during pull requests, which heavily affects both
the number of lines changed and the number of commits, and will result
in an median of around 1 commit with the minimal number of lines
changed, if many communities follow it. It isn't as prevalent since
most projects stopped using mailing lists to mail patches around and
started using Pull Requests for the same feature with multiple
commits, but it still exists in some places. However, notably, Linux
still requires rebasing/squashing for commits submitted there, so
there are a very small number of projects that still require it to be
compatible with mailing list ettiquette.
Don't get me wrong, I am not arguing that women don't contribute high
quality code to open source projects and should not do more of it in
future. I am just arguing that the causes that the authors have
presumed for many of their metrics either don't match the data
collection/analysis methodologies, or the metrics themselves are
flawed by their assumption that the numbers they are able to get, for
the small part of the massive population they are studying at a high
level, are meaningful.
It is great that there is publicity that in the 12% of the GitHub
community that could be distinguished by gender (1.4 million out of a
total of 12 million) there are women who are making valuable
contributions. It is also awesome that there are about 103 women who
have made more than 128 pull requests across the 31 million GitHub
repositories. They have chosen to avoid identifying any of these 103
women, who have both public GitHub and public Google+ profiles, while
they have published the methodology for finding them which is a little
strange, but it is great that they exist on projects somewhere.
To get a larger sample size to give their measures some context, it
would have been appropriate for them to include statistics on both the
total number of men/women/undertermined gender committing directly to
their own repositories and committing to shared repositories which
they are a member of, where they may not use pull requests and just
rely on Travis/Jenkins ongoing evaluation to tell them whether they
are still up to date. Ie, not everyone uses pull requests and it seems
unfair to isolate those actions from them committing directly. It may
destroy the basis for their article due to the much larger population
that commit directly to repositories, and hence probably why they
didn't refer to it, but it would be interesting for someone to do that
study.
There are also good reasons for putting this research up for
pre-publishing peer review, as they can solicit opinions about where
they haven't included the ideal metrics and where they could broaded
the scope to include those that they are ignoring by focusing on pull
requests and their assumptions about Open/Closed/Merged states.
Science is nothing if not reviewed by the community, and the current
vogue of 2-3 closed reviews before publication, to enable journal
impact factors and large publication counts to be used to pay
academics more efficiently, before publication doesn't help with that
so it is great that they put their research up for public review
before publication and hopefully they respond with changes before
publication per the usual pre-print iterative style.
If you want to discuss this purely as a social issue and not a
software engineering discussion, then feel free to use the numbers
without questioning the causality assumptions from a software
engineers point of view. I have just had too much experience of
GitHub/Open Source to take any numbers, including raw number of
commits or size, too literally and to use them in individual
evaluations, much less population wide statistical evaluations. I have
had many difficult to agree on pull requests in my experience, which
change very small numbers of lines of code, where much larger pull
requests, where the user has already used the standard code formatting
tools to reformat their code are accepted straight away based on
Travis/Jenkins results.
By example, they have a very novel idea for the small difference in
the number of pull requests also linked to an issue as having a social
cause. I have been on many projects that do not make it a typical open
issues before opening a pull request if they have code available
already, because pull requests include discussion as a core feature
and having another discussion location fragments the discussion, or
they have non-GitHub issue trackers (Atlassian Jira Cloud/OnDemand for
example are fairly popular) which won't be picked up by their
methodology. Many projects accept that it is easiest to simply open a
pull request and have the discussion there. If the discussion moves
away from acceptance, then the pull requester simply closes the pull
request themselves in favour of another separate pull request, still
without an issue. Both of those situations will cause significant
noise in their causality assumption, and make the numbers meaningless
if they don't measure them.
As the authors are surely aware, as they come from computer science
departments, but there are many software engineers who have a mantra
not to get attached to code, because in a lot of cases it is simpler
(particularly using Git/etc.) when you hit a wall to commit what you
have done so far to a branch (or close a pull request) and open a new
branch to test out a new idea from scratch which may mean closing the
original pull request. That may be a basis for the distribution for
males who go down from 80% to 100% rather than continuously
approaching 100% pull request acceptance as for women who may not have
the same population density to show the same symptoms of being okay
with an 80% pull request acceptance without attaching any social
implication to it as the authors have done as the basis for their
article. The idea that it is better to soldier on with an approach to
aim for 100% pull request acceptance is not time effective in the long
term from my experience, although I know coders who will keep making
changes without giving up on a branch until they come up with a
solution (and may contribute to a high mean number of commits per pull
request due to their low numbers in the community)
In addition, anyone familiar with CodeGolf would be skeptical of the
perception that larger pull requests correlates directly with code
quality so the authors are jumping a large chasm by implying that
larger code changes are directly related to more important features.
It may be the case that there are many large code contributions that
are related to important features, I am just skeptical that you can
examine them in a population without taking into account the context
for each pull request.
It would be great if they released some more aggregate data in machine
readable form to further continue their analysis and hopefully they
follow the PeerJ guidelines to make their data and code available for
review without delay so others can further explore parts of the
dataset that didn't make it into the publication. If they can't
release raw data, the standard deviation, the 95the percentile and the
maximum for each case on the number of commits would be very
informative and it would have been appropriate for them to determine
whether there are a few very large pull requests at the high end (ie,
statistical outliers) which are skewing their assumed distribution and
hence their results.
https://peerj.com/about/preprints/scope-and-instructions/#data-and-materials
Cheers,
Peter
...
On Fri, Feb 19, 2016 at 6:42 AM, Flöck, Fabian Fabian.Floeck@gesis.org
wrote:
...
There are several issues with this study, some of which are pointed out
here in a useful summary:
http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-g...
.
Fabian, slatestarcodex is a perennially unreliable source on discussions of
gender issues. You cannot take his analysis at face value and have to
actually read the paper.
...
Especially making the gender responsible for the difference in contrast to
other attributes of the users that might be just linked to the gender (maybe
the women that join GitHub are just the very best/professional women,
contribute only to specific types of code, etc., etc.)
The authors discuss this at length.  The result observed held true across
all languages and many types of code.  And their conclusions are indeed
guesses about what shared attributes might lead to the strong statistical
observation.
"Given that there is no "computer science gene" that occurs more often in
women than in men, there has to be a social bias at work."

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Gender bias in GitHub (but not entirely what you expect)