[Foundation-l] New Wikipedia gender gap research posted to Meta

Thomas Morton morton.thomas at googlemail.com
Mon Nov 7 11:32:09 UTC 2011


Hmmm.

Whilst this research has interest, I don't think it really says much about
gender. This form of gender coding is extremely hand wavy; and as the
latter part of the report proves it is also largely defunct when applied to
the internet.

They clearly show that the predominant style of writing (in their sample)
is male coded. With Geek Feminism matching Wikipedia almost exactly.

Wikihow formal scored seems to be the exclusion; but we can form a
reasonable hypothesis for that (which relates to the words chosen for
scoring). Due to the subject matter of Wikihow (i.e. answering a question)
the use of personal pronouns ("me", "myself") and queries ("what","how") -
words the report identifies as formal female writing - are going to have a
naturally high occurrence. The most obvious example is "your", which is
scored quite highly as female word and which is clearly going to be a
common word on Wikihow, but barely used on Wikipedia (we do not address the
reader).

I think this explains the anomaly better than gender; and we can test it
empirically by testing word occurrence in a larger sample of both sites.

Ultimately gender scoring article text is statistically useless because the
text being tested is the product of a number of editors. By numbers this
means only a small female input in the prose is to be expected. Plus, as
described, specific writing styles are needed in an encyclopaedia - which
preclude several high scoring female words. These factors make an already
vague test ineffective :)

If I understand the conclusion I think the authors are trying to suggest
that women on Wikipedia tend to communicate in a male style - and this
could lead us into identifying a cause of the gender gap.

I think there are a number of flaws to that hypothesis - and this study
does not really contribute to understanding it. Partly because the gender
language test is flawed And partly because the chosen test areas are ones
which either proscribe a certain writing style (articles) or are not
examples of communication (user pages).

I think the only real conclusion that can be drawn from this research is
that articles on Wikipedia use language scored as male by this specific
test.

With that said; I found a lot of interesting (and well written) material
here which could be built on in other contexts :) As the authors note -
looking at actual interactions (i.e. talk pages) is the only way to draw
proper conclusions as to whether language interaction is abnormal.

We should combine this in tandem with studies into:
- whether women find interacting in a male coded way is difficult or off
putting.
- whether male coded language is more predominant on the internet.
- larger sample sets of Wikipedia-like writing (i.e. formal, informational,
neutral) using an actual blind study (to avoid gender biasing in the test).

Tom


More information about the foundation-l mailing list