Yes, that was my thought. It would be difficult to know the sex (or the gender) of an
author name on a paper. There would inevitably be a lot that you could not determine. And
certainly in the sciences multi-author pages are the norm and even where you did know the
sex/gender of all, do you assign some part-score? E.g. 0 for all male, 1 for all female,
0.6 for 3 women and 2 men.
But I am curious why you are asking the question? That the writing/research of women is
under-represented in Wikipedia citations? If so, without conducting any research, I'd
say "yes it is under-represented". But my reason would be because women are
under-represented as writers/researchers in the first place. And certainly the older the
source, the more likely it is to be written by a man. So to investigate gender bias in
citations in Wikipedia, you would have to estimate the proportion of men/women (or at
least their outputs) over time in a given discipline and then ask the question,
"taking into account of the time of publication of a citation and the proportion of
men/women active in this discipline at that time, do Wikipedia citations show a sex/gender
basis?". Hmm ... very tricky.
I'd be inclined to suggest starting with a much simpler task. Pick a discipline
(preferably one with a professional society who can tell your their estimate of current
male/female ratio over (say) the past 5 years), limit the Wikipedia articles to topics in
that discipline, and limit the citations to those published within the last 5 years.
Indeed, perhaps limiting it to publications that are principally from the same country(s)
as the professional society from which you get the data (as clearly men/women's
participation in any discipline can vary with different countries for cultural reasons).
Then you have some way to gauge whether Wikipedia is showing more or less gender bias in
its citations than the discipline itself exhibits through publication. Quite a challenge!
And of course, it is not Wikipedia that adds citations. It is individual contributor who
add citations. Does the sex/gender of the contributor have any correlation to any observed
bias? Again, the task is made more difficult because a lot of Wikipedians don't
identify their sex/gender.
The other thing to be alert to is the difference in how (I believe) Wikipedians cite
compared to researchers. As a researcher, I will of course be reading papers in my field
all the time and what I read will influence my subsequent work. Therefore when I write
about my research, my citations are referring to papers that I have already read and whose
authors may be familiar to me from their other work, having met them at a conferences,
private correspondence, etc. However as a Wikipedian, I am only partially operating that
way (mostly when I write new articles or significantly expand them, that is, when I am
doing the research). A lot of the time I am adding citations relating to content other
people (often new users) have added/changed without citation. These come up on my
watchlist all the time. What do I do? Of course I could revert saying "no citation
provided", but that's not the way to encourage new contributors nor to grow the
encyclopedia, so if the information seems plausible (not obviously vandalism), I will
attempt to find a citation for it (using tools like Google and other topic-specialise
search tools). This is what I call "lucky dip" mode of citing as obviously I
have no idea what the source was for the original contributor. The sources I find from my
search may not already be known to me (frequently they are not). Or to summarise, IMHO,
researchers (or Wikipedians in "new content mode") cite a source already known
to them and whose authors may be known to them and could consciously or unconsciously
engage in some discrimination in citation based on sex/gender or other criteria, whereas
Wikipedians in "updating mode" are likely to be citing a source not previously
known to them and may be happy just to have found a source and are unlikely to be spending
a lot of their time researching the authors of that source to be extent they could then
consciously or unconsciously exercise discrimination on sex/gender. If I invest any extra
effort in such a situations, it's probably because the wording of the source is a
close match to the Wikipedia article which begs the question of copyright violation (which
needs to be dealt with by deletion or rewriting) or being a Wikipedia mirror (which is
obviously not an acceptable citation).
So I suspect whether a citation was added by the same contributor as the content it
supports or a subsequent contributor probably makes a difference to the likelihood of
conscious/unconscious discrimination.
Also, finally, often Wikipedia cites web pages and other sources that do not have any
individual authorship, e.g. government websites. Remember that Wikipedia prefers open
citations over paywalled citations and a lot of the publications behind paywalls are
individually authored.
Your proposed research has a lot of interesting challenges and a number of limitations.
I'm not saying don't do it, but I am saying start very small and see if you can
find any evidence to support your hypothesis before embarking on a larger study. Because
contributor behaviour is what you are trying to study, you probably need to do both
quantitative and qualitative experiments. E.g. I have described the two modes of citation
I do, but I cannot say how typical my behaviour is.
Kerry
-----Original Message-----
From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of
Leila Zia
Sent: Friday, 23 August 2019 3:44 AM
To: Research into Wikimedia content and communities
<wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Hi Greg,
A few comments if you're going to go with "proportion of male vs female authors
of the source material used as citations in arbitrary
articles":
* Please differentiate between sex (female, male, ...) and gender (woman, man, ...). My
understanding from your initial email is that you want to stay focused on gender, not
sex.
* Unless you have reliable sources about the gender of an author, I would not recommend
trying to predict what the gender is. (As you may know, this is not uncommon in social
media studies, for example, to predict the gender of the author based on their image or
their name.
These approaches introduce biases and social challenges.)
* Re your question about whether WMF has resources to look into this question in-house: I
can't speak for the whole of WMF, however, I can share more about the Research
team's direction. As part of our future work, we would like to "help contributors
monitor violations of core content policies and assess information reliability and bias
both granularly and at scale". [1] The question you proposed can fall under assessing
bias in content (considering citations as part of the content). I expect us to focus first
on the piece about violations of core content policies and information reliability and
come back to the bias question later. As a result, we won't have bandwidth to do your
proposal in-house at the moment. Sorry about that.
I hope this helps.
Best,
Leila
[1] Section 2 of our Knowledge Integrity whitepaper:
https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_W…
On Thu, Aug 22, 2019 at 9:57 AM Greg <thenatureprogram(a)gmail.com> wrote:
Hi Kerry,
Those are all very interesting ways to look at this. I was thinking
mostly along the lines of your first bullet point, but I'd be
interested in research in any of those areas.
Thanks,
Greg
On Thu, Aug 22, 2019 at 5:00 AM
<wiki-research-l-request(a)lists.wikimedia.org>
wrote:
Send Wiki-research-l mailing list submissions to
wiki-research-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
or, via email, send a message with subject or body 'help' to
wiki-research-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wiki-research-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wiki-research-l digest..."
Today's Topics:
1. gender balance of wikipedia citations (Greg)
2. Re: gender balance of wikipedia citations (Kerry Raymond)
--------------------------------------------------------------------
--
Message: 1
Date: Wed, 21 Aug 2019 20:19:18 -0700
From: Greg <thenatureprogram(a)gmail.com>
To: wiki-research-l(a)lists.wikimedia.org
Subject: [Wiki-research-l] gender balance of wikipedia citations
Message-ID:
<
CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Greetings!
I was looking for information about the gender balance of Wikipedia
citations and no one I've asked knows of any work on this topic. Do you?
I think this is an important question.
Here's what I've learned so far:
Wikipedia citations are currently in the form of text strings. There
is also an initiative to place citations in an annotated structured
repository (wikicite). I do not know the current status of wikicite
or if/when this could be used for this inquiry--either to examine
all, or a sensible subset of the citations.
My perspective is that understanding the gender balance is
necessary and urgent. The balance could be better, the same, or
worse than the citation balances we already know, and the scale of the effect is quite
large.
Is this a line of inquiry that the wikimedia/wikicite community is
interested in pursuing? If so, what is the best way to get started?
Does the WMF have the resources and interest to look into this matter inhouse?
Thanks for your thoughts.
Greg
------------------------------
Message: 2
Date: Thu, 22 Aug 2019 13:53:45 +1000
From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
To: "'Research into Wikimedia content and communities'"
<wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
Content-Type: text/plain; charset="UTF-8"
Could you elaborate a bit more on what you mean by the gender
balance of citations?
Are you talking about:
* proportion of male vs female authors of the source material used
as citations in arbitrary articles>
* the quality/quantity of citations in biography articles of men vs women?
* the quality/quantity of citations in articles that are gendered by
some other criteria (e.g. reader interest, romantic comedy vs action film)?
Kerry
-----Original Message-----
From: Wiki-research-l
[mailto:wiki-research-l-bounces@lists.wikimedia.org]
On Behalf Of Greg
Sent: Thursday, 22 August 2019 1:19 PM
To: wiki-research-l(a)lists.wikimedia.org
Subject: [Wiki-research-l] gender balance of wikipedia citations
Greetings!
I was looking for information about the gender balance of Wikipedia
citations and no one I've asked knows of any work on this topic. Do you?
I think this is an important question.
Here's what I've learned so far:
Wikipedia citations are currently in the form of text strings. There
is also an initiative to place citations in an annotated structured
repository (wikicite). I do not know the current status of wikicite
or if/when this could be used for this inquiry--either to examine
all, or a sensible subset of the citations.
My perspective is that understanding the gender balance is
necessary and urgent. The balance could be better, the same, or
worse than the citation balances we already know, and the scale of the effect is quite
large.
Is this a line of inquiry that the wikimedia/wikicite community is
interested in pursuing? If so, what is the best way to get started?
Does the WMF have the resources and interest to look into this matter inhouse?
Thanks for your thoughts.
Greg
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Subject: Digest Footer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
End of Wiki-research-l Digest, Vol 168, Issue 11
************************************************
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l