On 20 February 2016 at 12:44, Samuel Klein meta.sj@gmail.com wrote:
The full paper is very much worth reading.
Peter writes:
One theory may be that outsiders contribute trivial fixes, which are virtually assured to have a 100% acceptance rate by communities that wish to expand.
Did you read the paper? "the changes proposed by women typically included more lines of code than men's, so they weren't just submitting smaller contributions either."
I discounted their conclusion based on their current numbers and statistical analysis methods, making the proposition still valid in the long term and one that could be examined in more detail in the future.
In particular, the raw median (50th percentile) numbers are very close and applying normal distribution-based statistical tools to say that 5 is more significant than 4 or 29 is more significant than 24 doesn't provide causal validity for their results. Ie, the raw median numbers were 29/5/1/2 versus 24/4/1/2 for commit size, and indicate that a median contribution in both cases is very small indeed, as expected from experience in Open Source projects. The differences at the mean however may come down to any number of factors which are not distinguishing on software quality or software engineer competence, as are any line-based software metrics that managers over the years have tried to apply to software engineers. All of their statistical measurements to come to that conclusion are based on the mean, which doesn't give an approximation to the degree of the long tail as it assumes a normal distribution by using both the mean and chi-square.
FWIW, I never imagined that the mean number of commits would get as high as 29, as that is already a large pull request. With the median at 2, the high mean indicates that somewhere on GitHub there are pull requests that get very large to push the mean up from 2 at the 50th percentile to some large number (>>29) at the 95th percentile and above. Graphing this may not have helped their hypothesis, but it would have been very informative to see which section of the pull request size distribution most of their verifying evidence would have come from.
They also didn't allow for the relatively common community preference to rebasing/squashing during pull requests, which heavily affects both the number of lines changed and the number of commits, and will result in an median of around 1 commit with the minimal number of lines changed, if many communities follow it. It isn't as prevalent since most projects stopped using mailing lists to mail patches around and started using Pull Requests for the same feature with multiple commits, but it still exists in some places. However, notably, Linux still requires rebasing/squashing for commits submitted there, so there are a very small number of projects that still require it to be compatible with mailing list ettiquette.
Don't get me wrong, I am not arguing that women don't contribute high quality code to open source projects and should not do more of it in future. I am just arguing that the causes that the authors have presumed for many of their metrics either don't match the data collection/analysis methodologies, or the metrics themselves are flawed by their assumption that the numbers they are able to get, for the small part of the massive population they are studying at a high level, are meaningful.
It is great that there is publicity that in the 12% of the GitHub community that could be distinguished by gender (1.4 million out of a total of 12 million) there are women who are making valuable contributions. It is also awesome that there are about 103 women who have made more than 128 pull requests across the 31 million GitHub repositories. They have chosen to avoid identifying any of these 103 women, who have both public GitHub and public Google+ profiles, while they have published the methodology for finding them which is a little strange, but it is great that they exist on projects somewhere.
To get a larger sample size to give their measures some context, it would have been appropriate for them to include statistics on both the total number of men/women/undertermined gender committing directly to their own repositories and committing to shared repositories which they are a member of, where they may not use pull requests and just rely on Travis/Jenkins ongoing evaluation to tell them whether they are still up to date. Ie, not everyone uses pull requests and it seems unfair to isolate those actions from them committing directly. It may destroy the basis for their article due to the much larger population that commit directly to repositories, and hence probably why they didn't refer to it, but it would be interesting for someone to do that study.
There are also good reasons for putting this research up for pre-publishing peer review, as they can solicit opinions about where they haven't included the ideal metrics and where they could broaded the scope to include those that they are ignoring by focusing on pull requests and their assumptions about Open/Closed/Merged states. Science is nothing if not reviewed by the community, and the current vogue of 2-3 closed reviews before publication, to enable journal impact factors and large publication counts to be used to pay academics more efficiently, before publication doesn't help with that so it is great that they put their research up for public review before publication and hopefully they respond with changes before publication per the usual pre-print iterative style.
If you want to discuss this purely as a social issue and not a software engineering discussion, then feel free to use the numbers without questioning the causality assumptions from a software engineers point of view. I have just had too much experience of GitHub/Open Source to take any numbers, including raw number of commits or size, too literally and to use them in individual evaluations, much less population wide statistical evaluations. I have had many difficult to agree on pull requests in my experience, which change very small numbers of lines of code, where much larger pull requests, where the user has already used the standard code formatting tools to reformat their code are accepted straight away based on Travis/Jenkins results.
By example, they have a very novel idea for the small difference in the number of pull requests also linked to an issue as having a social cause. I have been on many projects that do not make it a typical open issues before opening a pull request if they have code available already, because pull requests include discussion as a core feature and having another discussion location fragments the discussion, or they have non-GitHub issue trackers (Atlassian Jira Cloud/OnDemand for example are fairly popular) which won't be picked up by their methodology. Many projects accept that it is easiest to simply open a pull request and have the discussion there. If the discussion moves away from acceptance, then the pull requester simply closes the pull request themselves in favour of another separate pull request, still without an issue. Both of those situations will cause significant noise in their causality assumption, and make the numbers meaningless if they don't measure them.
As the authors are surely aware, as they come from computer science departments, but there are many software engineers who have a mantra not to get attached to code, because in a lot of cases it is simpler (particularly using Git/etc.) when you hit a wall to commit what you have done so far to a branch (or close a pull request) and open a new branch to test out a new idea from scratch which may mean closing the original pull request. That may be a basis for the distribution for males who go down from 80% to 100% rather than continuously approaching 100% pull request acceptance as for women who may not have the same population density to show the same symptoms of being okay with an 80% pull request acceptance without attaching any social implication to it as the authors have done as the basis for their article. The idea that it is better to soldier on with an approach to aim for 100% pull request acceptance is not time effective in the long term from my experience, although I know coders who will keep making changes without giving up on a branch until they come up with a solution (and may contribute to a high mean number of commits per pull request due to their low numbers in the community)
In addition, anyone familiar with CodeGolf would be skeptical of the perception that larger pull requests correlates directly with code quality so the authors are jumping a large chasm by implying that larger code changes are directly related to more important features. It may be the case that there are many large code contributions that are related to important features, I am just skeptical that you can examine them in a population without taking into account the context for each pull request.
It would be great if they released some more aggregate data in machine readable form to further continue their analysis and hopefully they follow the PeerJ guidelines to make their data and code available for review without delay so others can further explore parts of the dataset that didn't make it into the publication. If they can't release raw data, the standard deviation, the 95the percentile and the maximum for each case on the number of commits would be very informative and it would have been appropriate for them to determine whether there are a few very large pull requests at the high end (ie, statistical outliers) which are skewing their assumed distribution and hence their results.
https://peerj.com/about/preprints/scope-and-instructions/#data-and-materials
Cheers,
Peter
On Fri, Feb 19, 2016 at 6:42 AM, Flöck, Fabian Fabian.Floeck@gesis.org wrote:
There are several issues with this study, some of which are pointed out here in a useful summary: http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-g... .
Fabian, slatestarcodex is a perennially unreliable source on discussions of gender issues. You cannot take his analysis at face value and have to actually read the paper.
Especially making the gender responsible for the difference in contrast to other attributes of the users that might be just linked to the gender (maybe the women that join GitHub are just the very best/professional women, contribute only to specific types of code, etc., etc.)
The authors discuss this at length. The result observed held true across all languages and many types of code. And their conclusions are indeed guesses about what shared attributes might lead to the strong statistical observation. "Given that there is no "computer science gene" that occurs more often in women than in men, there has to be a social bias at work."
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l