Forwarding a response from Hawkeye7:
Yes, "the only real conclusion that can be drawn from this research is that articles on Wikipedia use language scored as male by this specific test."
This is always the problem with quantitative analysis. On the one hand, the test is both objective and repeatable; on the other it produces facts without explanation. In particular, without some form of model, it is often hard to tell cause from effect.
This is what we know; but it is but a piece of the larger puzzle.
Hawkeye7
Message: 6 Date: Mon, 7 Nov 2011 11:32:09 +0000 From: Thomas Morton morton.thomas@googlemail.com Subject: Re: [Foundation-l] New Wikipedia gender gap research posted to Meta To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: CAKO2H7-mtBU4KD_=tBLuVePJtFttzRM9+3L3jHr06ogiHoXHrg@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
Hmmm.
Whilst this research has interest, I don't think it really says much about gender. This form of gender coding is extremely hand wavy; and as the latter part of the report proves it is also largely defunct when applied to the internet.
They clearly show that the predominant style of writing (in their sample) is male coded. With Geek Feminism matching Wikipedia almost exactly.
Wikihow formal scored seems to be the exclusion; but we can form a reasonable hypothesis for that (which relates to the words chosen for scoring). Due to the subject matter of Wikihow (i.e. answering a question) the use of personal pronouns ("me", "myself") and queries ("what","how") - words the report identifies as formal female writing - are going to have a naturally high occurrence. The most obvious example is "your", which is scored quite highly as female word and which is clearly going to be a common word on Wikihow, but barely used on Wikipedia (we do not address the reader).
I think this explains the anomaly better than gender; and we can test it empirically by testing word occurrence in a larger sample of both sites.
Ultimately gender scoring article text is statistically useless because the text being tested is the product of a number of editors. By numbers this means only a small female input in the prose is to be expected. Plus, as described, specific writing styles are needed in an encyclopaedia - which preclude several high scoring female words. These factors make an already vague test ineffective :)
If I understand the conclusion I think the authors are trying to suggest that women on Wikipedia tend to communicate in a male style - and this could lead us into identifying a cause of the gender gap.
I think there are a number of flaws to that hypothesis - and this study does not really contribute to understanding it. Partly because the gender language test is flawed And partly because the chosen test areas are ones which either proscribe a certain writing style (articles) or are not examples of communication (user pages).
I think the only real conclusion that can be drawn from this research is that articles on Wikipedia use language scored as male by this specific test.
With that said; I found a lot of interesting (and well written) material here which could be built on in other contexts :) As the authors note - looking at actual interactions (i.e. talk pages) is the only way to draw proper conclusions as to whether language interaction is abnormal.
We should combine this in tandem with studies into:
- whether women find interacting in a male coded way is difficult or off
putting.
- whether male coded language is more predominant on the internet.
- larger sample sets of Wikipedia-like writing (i.e. formal, informational,
neutral) using an actual blind study (to avoid gender biasing in the test).
Tom
wikimedia-l@lists.wikimedia.org