Thanks again, Kerry. I am hoping that someone with access to more resources
(knowledge, support, etc) than I have will look into this.
A few more thoughts/questions:
1. The link to the citation dataset from the Medium article ("What are the
ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
2. As far as I can tell, every named author in the top ten most cited
sources on Wikipedia is male. One piece is by a working group
3. This line from the Medium piece struck me: "Many of these publications
have been cited by Wikipedians across large series of articles using
powerful bots and automated tools."
Are citations being added by bots? I'm not sure that I understand that line
correctly.
Greg
On Sat, Aug 24, 2019 at 1:51 AM <wiki-research-l-request(a)lists.wikimedia.org>
wrote:
Send Wiki-research-l mailing list submissions to
wiki-research-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
or, via email, send a message with subject or body 'help' to
wiki-research-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wiki-research-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wiki-research-l digest..."
Today's Topics:
1. Re: sockpuppets and how to find them sooner (Timothy Wood)
2. Re: gender balance of Wikipedia citations (Kerry Raymond)
----------------------------------------------------------------------
Message: 1
Date: Fri, 23 Aug 2019 17:51:41 -0400
From: Timothy Wood <timothyjosephwood(a)gmail.com>
To: Kerry Raymond <kerry.raymond(a)gmail.com>
Cc: Research into Wikimedia content and communities
<wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] sockpuppets and how to find them sooner
Message-ID:
<CAMy3BEJ8=E1FgifdqgY+=
vygN3ULqOmJghKm_29qWuY3P-Fd+g(a)mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Then again, apparently the Foundation has a PR team whose only job is to
compile the latest marketing buzzwords, and they seem to really love AI.
You might get some buy in. Never know.
V/r
TJW/GMG
On Fri, Aug 23, 2019, 11:23 Kerry Raymond <kerry.raymond(a)gmail.com> wrote:
That's why I think we need
"signatures" which is my shorthand for things
like a hash function or a bounding box, a means by which many
non-matching
accounts can be eliminated at low cost, reserving
the high cost
comparisons
(machine or human) only for high probability
candidates. It is
machine-computed and *stored* on the banning/blocking of a user. When a
suspect user is presented, it calculates their signature and then
compares
them against the pre-calculated signatures of the
bad users. I don't
think
it is too expensive if we can find the right
"signature". CPU cycles are
pretty fast. I only have an average laptop CPU-wise but I burn through
loads of comparisons of geographic boundaries (complex polygons with many
points) thanks to bounding boxes which reduce the complex shape to the
smallest rectangle that contains it. Testing intersection of polygons is
expensive, testing the intersection of rectangles is trivial.
I think we can probably ignore the myriad of trivial bad guys for the
purposes of signature collecting, eg blocked for vandalism after their
first few edits. Sock puppets or their masters don't immediately appear
as
bad guys on individual edits. It's often more
about long-term behaviours
like POV pushing, refusal to engage in consensus building, slow burning
edit wars, etc, that does not show on individual edits.
Kerry
Sent from my iPad
On 23 Aug 2019, at 11:42 pm, Timothy Wood <timothyjosephwood(a)gmail.com>
wrote:
You are correct that in all but the most obvious cases, filing an SPI can
be exceptionally time consuming. I'm afraid there is no obvious technical
solution there that would not involve a complicated AI that is probably
beyond the ability of the foundation to produce.
There is quite a bit of data available in the form of years of SPIs, but
it seems like you're talking about Facebook or Google levels of machine
learning, and even years of SPIs is tiny compared to the amount of data
they work with.
On a separate note, frequently changing IP adresses is most often an
indicator of nothing more than someone who is editing on a mobile
connection. This can usually be easily verified with an online IP lookup.
V/r
TJW/GMG
On Fri, Aug 23, 2019, 02:44 RhinosF1 <rhinosf1(a)gmail.com> wrote:
> Just a note that you can still go through warnings for vandalism etc.
and
> report to AIV.
>
> Or at that edit speed, you may have a chance at AN at reporting for
> bot-like edits which will draw attention to the account.
>
> If you ever need help, things like #wikipedia-en-help on Freenode IRC
> exist
> so you can ask other users.
>
> RhinosF1
> Miraheze Volunteer
>
> On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <kerry.raymond(a)gmail.com>
> wrote:
>
> > Currently, to open a sockpuppet investigation, you must name the two
(or
> > more) accounts that you believe to be
sockpuppets with "clear,
> behavioural
> > evidence of sock puppetry" which is typically in the form of pairs of
> edits
> > that demonstrate similar edit behaviours that are unlikely to
naturally
> > occur. Now if you spend enough time
on-wiki, you develop an intuition
> about
> > behaviours you see on your watchlist and in article edit histories.
> Often I
> > am highly suspicious that an account is a sockpuppet, but I cannot
> report
> > them because I don't know which other account is involved.
> >
> >
> >
> > As a example, I recently encounted User:Shelati an account about 1 day
> old
> > at that time with nearly 100 edits in that day all about 1-2 minutes
> apart,
> > mostly making a similar change to a large number of Australian place
> > infoboxes.
> >
> >
> >
> >
>
https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
> > <
> >
>
https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am…
> >
fset=20190728053057&limit=100&target=Shelati
> > <
>
https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am…
> >
> > >
> > &offset=20190728053057&limit=100&target=Shelati
> >
> >
> >
> > Genuine new users do not edit that quickly, do not use templates and
do
> not
> > mess structurally with infoboxes (at most they try to change the
> values).
> > It
> > "smelled" like a sockpuppet. However, as I did not recognise that
> pattern
> > of
> > edit behaviour as being that of any other user I was familiar with, it
> > wasn't something I could report for sockpuppet investigation. Anyhow
> after
> > about 2 weeks, the user was blocked as a sockpuppet. Someone must have
> > noticed and figured out the other account:
> >
> >
> >
> >
> >
>
https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
> > Archive
> >
> >
> >
> > Two weeks and 1,279 edits later . that's over 1000 possibly
problematic
> > edits after I first suspected them. But
that's nothing compared with
> > another
> > ongoing situation in which a very large number of different IPs are
> engaged
> > in a pattern of problem edits on mostly Australian articles (a few
> > different
> > types of edits but an obvious "quack like a duck" situation). The IP
> number
> > changes frequently (and one assumes deliberately). The edits
> potentially go
> > back to 2013 but appear to have intensified in 2018/2019. Here's one
> user's
> > summary of all the IP addresses involved, and the extent to which they
> have
> > been cleaned up, given many thousands of edits are involved, see:
> >
> >
> >
> >
https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup
> >
> >
> >
> > As well as the damage done to the content (which harms the readers),
> these
> > IP sockpuppets are consuming enormous amounts of effort to track them
> down
> > and revert them, which could be more productively used to improve the
> > content. We need better tools to foil these pests. So I want to put
that
challenge out to this list.
Kerry
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
RhinosF1
Miraheze Volunteer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Message: 2
Date: Sat, 24 Aug 2019 18:29:36 +1000
From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
To: "'Research into Wikimedia content and communities'"
<wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
Message-ID: <004801d55a56$11f4a270$35dde750$(a)gmail.com>
Content-Type: text/plain; charset="utf-8"
I am inclined to think that political science has more Point of View in it
than say chemistry. I also suspect it has fewer authors per book/paper. So
I can imagine that people citing political science literature may be more
inclined to cherry pick the sources that support their own POV which may
involve some gender bias in some way. I would think it less likely in
chemistry to cherry pick sources (which is not to say there are no divided
schools of thought in chemistry but it is a more experimental discipline
with strong commitment to factual data and less to opinion).
But having said all that, whether and in what circumstances that the
selection of sources in Wikipedia might be sex/gender biased, I honestly
don't know. But if it manifests outside of Wikipedia as you suggest, then I
would be very surprised if it wasn't replicated in Wikipedia to some
extent. But I guess your question is whether is Wikipedia merely reflects
the society it lives in (similar levels of gender bias) or whether there is
something about Wikipedia which acerbates or ameliorates the situation? I
am genuine curious what a small study would discover and agree that
replicating (as much as possible) the existing study outside of Wikipedia)
provides a good starting point. You might approach the authors of that
study to see if they are willing to collaborate on such a project, either
in design, data sharing or more fully. I look forward to seeing the results.
Kerry
-----Original Message-----
From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org]
On Behalf Of Greg
Sent: Friday, 23 August 2019 5:01 PM
To: wiki-research-l(a)lists.wikimedia.org
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Wow, Kerry! Thank you for taking the time to write all these thoughts out.
I'm asking the question because I'm concerned that the gender balance of
the authors being cited on wikipedia is different from the already quite
bad patterns in academia. My fear is that the citation gender imbalance on
Wikipedia is more pronounced. If so, it is not just perpetuating the
problem, but making it worse by surfacing certain authors and ideas even
more frequently, or hardly at all. I would like to know if this is the
case, and if so, how big the effect is.
In my last message, I mention a study about a set of award-winning
political science books (the researchers study the citation gender
imbalance for that set). I just saw this study today, but I began to think
that it/the set of works--or some similar set of titles--could possibly be
a good place to begin, especially if the original researchers were willing
to share the list of titles/authors/gender/etc that they put
together/worked with. Then it seems it would mostly be a matter of figuring
out how to understand how those titles are cited on Wikipedia--through
either the citation dataset or wikicite--to see if/how the citation
patterns differ (i.e., if the works by women/men are cited more
frequently/at the same rate/less frequently on Wikipedia than what the
researchers found in the original study).
This seems like it would be easier to do than what you propose, but
perhaps the idea is not sound. Until very recently, I thought I could find
the answer in an existing paper! I honestly don't know the best way to get
the answer, but I would like to know the answer and think it's important to
look at.
All of the things you bring up--from the gender of the editor, to the type
of editing being done, to the issues around multiple authors/paywalls/year
of publication/field--complicate the inquiry, and in particular a larger
one. I agree with what you say about doing something small first to see
what's there.
Thanks again for all your thoughts.
Greg
On Thu, Aug 22, 2019 at 9:41 PM <
wiki-research-l-request(a)lists.wikimedia.org>
wrote:
Send Wiki-research-l mailing list submissions to
wiki-research-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
or, via email, send a message with subject or body 'help' to
wiki-research-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wiki-research-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wiki-research-l digest..."
Today's Topics:
1. Re: gender balance of wikipedia citations (Greg)
2. Re: gender balance of wikipedia citations (Kerry Raymond)
----------------------------------------------------------------------
Message: 1
Date: Thu, 22 Aug 2019 18:47:48 -0700
From: Greg <thenatureprogram(a)gmail.com>
To: wiki-research-l(a)lists.wikimedia.org
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Message-ID:
<
CAOO9DNvBrw_aLkRUp5kYFLdaLJUEK+ddiz-A09MZwiotAdAmUw(a)mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Hi Leila,
Thanks for your thoughts.
Having just read Troy Vettese's very powerful essay, Sexism in the
Academy (
https://nplusonemag.com/issue-34/essays/sexism-in-the-academy/), I
wish this were a top priority.
I stumbled upon a study today--it came up in the Washington Post's
excellent series on gender bias in political science. The authors look
at a set of award winning political science books and the gender
imbalance in the citations drawn from google scholar. I'm linking the
piece here in case anyone on this list is interested now, or in the
future, in how the patterns on Wikipedia compare.
Washington Post piece: "There’s a gender gap in who wins political
science book awards – and in how widely they’re cited"
https://www.washingtonpost.com/politics/2019/08/22/theres-gender-gap-w
ho-wins-political-science-book-awards-how-widely-theyre-cited/
"Just as significantly, women’s award-winning books receive fewer
scholarly citations than men’s award-winning volumes — and this
disparity has grown, rather than shrunk, in recent years. Over the
entire period, APSA award-winning volumes by women averaged 43 percent
fewer citations per year than those by male authors."
Paper: "Winning awards and gaining recognition: An impact analysis of
APSA section book prizes"
https://www.sciencedirect.com/science/article/abs/pii/S036233191830086
7
Best,
Greg
On Thu, Aug 22, 2019 at 3:44 PM <
wiki-research-l-request(a)lists.wikimedia.org>
wrote:
Send Wiki-research-l mailing list submissions to
wiki-research-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
or, via email, send a message with subject or body 'help' to
wiki-research-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wiki-research-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wiki-research-l digest..."
Today's Topics:
1. Re: gender balance of wikipedia citations (Greg)
2. Re: gender balance of wikipedia citations (Leila Zia)
3. Wikimania 2019 disinformation meetup follow-up (Leila Zia)
4. Upcoming Research Newsletter (special issue on gender gap
research): New papers open for review (Mohammed Sadat Abdulai)
--------------------------------------------------------------------
--
Message: 1
Date: Thu, 22 Aug 2019 09:57:15 -0700
From: Greg <thenatureprogram(a)gmail.com>
To: wiki-research-l(a)lists.wikimedia.org
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Message-ID:
<CAOO9DNuSYzzaVwcdqiWA7pj671z3N43XOSwv6DtW0SxWg=
L8GQ(a)mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Hi Kerry,
Those are all very interesting ways to look at this. I was thinking
mostly
along the lines of your first bullet point, but
I'd be interested in
research in any of those areas.
Thanks,
Greg
On Thu, Aug 22, 2019 at 5:00 AM <
wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more
> specific than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. gender balance of wikipedia citations (Greg)
> 2. Re: gender balance of wikipedia citations (Kerry Raymond)
>
>
> ------------------------------------------------------------------
> ----
>
> Message: 1
> Date: Wed, 21 Aug 2019 20:19:18 -0700
> From: Greg <thenatureprogram(a)gmail.com>
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
> <
> CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.com
> CAOO9DNtY+>
> Content-Type: text/plain; charset="UTF-8"
>
> Greetings!
>
> I was looking for information about the gender balance of
> Wikipedia citations and no one I've asked knows of any work on
> this topic. Do
you?
I think this is an important question.
Here's what I've learned so far:
Wikipedia citations are currently in the form of text strings.
There is also an initiative to place citations in an annotated
structured
repository
> (wikicite). I do not know the current status of wikicite or
> if/when
this
could be
used for this inquiry--either to examine all, or a
sensible
subset
> of the citations.
>
> My perspective is that understanding the gender balance is
> necessary
and
> urgent. The balance could be better, the
same, or worse than the
citation
> balances we already know, and the scale of
the effect is quite large.
>
> Is this a line of inquiry that the wikimedia/wikicite community is
> interested in pursuing? If so, what is the best way to get started?
Does
> the WMF have the resources and interest to
look into this matter
inhouse?
>
> Thanks for your thoughts.
>
> Greg
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 22 Aug 2019 13:53:45 +1000
> From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> To: "'Research into Wikimedia content and communities'"
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia
> citations
> Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Could you elaborate a bit more on what you mean by the gender
> balance
of
citations?
Are you talking about:
* proportion of male vs female authors of the source material used
as citations in arbitrary articles>
* the quality/quantity of citations in biography articles of men
vs
women?
> * the quality/quantity of citations in articles that are gendered
> by
some
> > other criteria (e.g. reader interest, romantic comedy vs action
film)?
Kerry
-----Original Message-----
From: Wiki-research-l [mailto:
wiki-research-l-bounces(a)lists.wikimedia.org]
> On Behalf Of Greg
> Sent: Thursday, 22 August 2019 1:19 PM
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: [Wiki-research-l] gender balance of wikipedia citations
>
> Greetings!
>
> I was looking for information about the gender balance of
> Wikipedia citations and no one I've asked knows of any work on
> this topic. Do
you?
I think this is an important question.
Here's what I've learned so far:
Wikipedia citations are currently in the form of text strings.
There is also an initiative to place citations in an annotated
structured
repository
> (wikicite). I do not know the current status of wikicite or
> if/when
this
could be
used for this inquiry--either to examine all, or a
sensible
subset
> of the citations.
>
> My perspective is that understanding the gender balance is
> necessary
and
> urgent. The balance could be better, the
same, or worse than the
citation
> balances we already know, and the scale of
the effect is quite large.
>
> Is this a line of inquiry that the wikimedia/wikicite community is
> interested in pursuing? If so, what is the best way to get started?
Does
> the WMF have the resources and interest to
look into this matter
inhouse?
Thanks for your thoughts.
Greg
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Subject: Digest Footer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
End of Wiki-research-l Digest, Vol 168, Issue 11
************************************************
------------------------------
Message: 2
Date: Thu, 22 Aug 2019 10:43:51 -0700
From: Leila Zia <leila(a)wikimedia.org>
To: Research into Wikimedia content and communities
<wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Message-ID:
<CAK0Oe2uCo70_=ma2b=2d+fvr4GseEVxOP0sh=
ELNOpKdCuUfqA(a)mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Hi Greg,
A few comments if you're going to go with "proportion of male vs
female authors of the source material used as citations in arbitrary
articles":
* Please differentiate between sex (female, male, ...) and gender
(woman, man, ...). My understanding from your initial email is that
you want to stay focused on gender, not sex.
* Unless you have reliable sources about the gender of an author, I
would not recommend trying to predict what the gender is. (As you
may know, this is not uncommon in social media studies, for example,
to predict the gender of the author based on their image or their name.
These approaches introduce biases and social challenges.)
* Re your question about whether WMF has resources to look into this
question in-house: I can't speak for the whole of WMF, however, I
can share more about the Research team's direction. As part of our
future work, we would like to "help contributors monitor violations
of core content policies and assess information reliability and bias
both granularly and at scale". [1] The question you proposed can
fall under assessing bias in content (considering citations as part
of the content). I expect us to focus first on the piece about
violations of core content policies and information reliability and
come back to the bias question later. As a result, we won't have
bandwidth to do your proposal in-house at the moment. Sorry about that.
I hope this helps.
Best,
Leila
[1] Section 2 of our Knowledge Integrity whitepaper:
https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrit
y_-_Wikimedia_Research_2030.pdf
>
>
> On Thu, Aug 22, 2019 at 9:57 AM Greg <thenatureprogram(a)gmail.com>
wrote:
Hi Kerry,
Those are all very interesting ways to look at this. I was
thinking
mostly
along the lines of your first bullet point, but
I'd be interested
in research in any of those areas.
Thanks,
Greg
On Thu, Aug 22, 2019 at 5:00 AM <
wiki-research-l-request(a)lists.wikimedia.org>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> > wiki-research-l(a)lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >
> >
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> > wiki-research-l-request(a)lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > wiki-research-l-owner(a)lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more
> > specific than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> > 1. gender balance of wikipedia citations (Greg)
> > 2. Re: gender balance of wikipedia citations (Kerry Raymond)
> >
> >
> >
----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 21 Aug 2019 20:19:18 -0700
> From: Greg <thenatureprogram(a)gmail.com>
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
> <
> CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.c
> CAOO9DNtY+om>
> Content-Type: text/plain; charset="UTF-8"
>
> Greetings!
>
> I was looking for information about the gender balance of
> Wikipedia citations and no one I've asked knows of any work on
> this topic. Do
you?
> >
> > I think this is an important question.
> >
> > Here's what I've learned so far:
> >
> > Wikipedia citations are currently in the form of text strings.
> > There
is
> > > also an initiative to place citations in an annotated structured
> repository
> > > (wikicite). I do not know the current status of wikicite or
> > > if/when
> this
> > > could be used for this inquiry--either to examine all, or a
> > > sensible
> subset
> > > of the citations.
> > >
> > > My perspective is that understanding the gender balance is
> > > necessary
> and
> > > urgent. The balance could be better, the same, or worse than the
> citation
> > > balances we already know, and the scale of the effect is quite
large.
> > >
> > > Is this a line of inquiry that the wikimedia/wikicite community
> > > is interested in pursuing? If so, what is the best way to get
started?
Does
> the WMF have the resources and interest to
look into this matter
inhouse?
>
> Thanks for your thoughts.
>
> Greg
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 22 Aug 2019 13:53:45 +1000
> From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> To: "'Research into Wikimedia content and communities'"
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia
> citations
> Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Could you elaborate a bit more on what you mean by the gender
> balance
of
> citations?
>
> Are you talking about:
>
> * proportion of male vs female authors of the source material
> used as citations in arbitrary articles>
> * the quality/quantity of citations in biography articles of
> men vs
women?
> * the quality/quantity of citations in
articles that are
> gendered by
some
> > other criteria (e.g. reader interest, romantic comedy vs action
film)?
>
> Kerry
>
> -----Original Message-----
> From: Wiki-research-l [mailto:
wiki-research-l-bounces(a)lists.wikimedia.org]
> On Behalf Of Greg
> Sent: Thursday, 22 August 2019 1:19 PM
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: [Wiki-research-l] gender balance of wikipedia citations
>
> Greetings!
>
> I was looking for information about the gender balance of
> Wikipedia citations and no one I've asked knows of any work on
> this topic. Do
you?
> >
> > I think this is an important question.
> >
> > Here's what I've learned so far:
> >
> > Wikipedia citations are currently in the form of text strings.
> > There
is
> > > also an initiative to place citations in an annotated structured
> repository
> > > (wikicite). I do not know the current status of wikicite or
> > > if/when
> this
> > > could be used for this inquiry--either to examine all, or a
> > > sensible
> subset
> > > of the citations.
> > >
> > > My perspective is that understanding the gender balance is
> > > necessary
> and
> > > urgent. The balance could be better, the same, or worse than the
> citation
> > > balances we already know, and the scale of the effect is quite
large.
> > >
> > > Is this a line of inquiry that the wikimedia/wikicite community
> > > is interested in pursuing? If so, what is the best way to get
started?
Does
> the WMF have the resources and interest to
look into this matter
inhouse?
>
> Thanks for your thoughts.
>
> Greg
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 11
> ************************************************
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Message: 3
Date: Thu, 22 Aug 2019 13:36:17 -0700
From: Leila Zia <leila(a)wikimedia.org>
To: Research into Wikimedia content and communities
<wiki-research-l(a)lists.wikimedia.org>
Subject: [Wiki-research-l] Wikimania 2019 disinformation meetup
follow-up
Message-ID:
<CAK0Oe2sodYJpkuhSqgo3dtfDr=
NQ5EK1TdH16F6BOkTyFho9Rg(a)mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Hi,
This message is for those of you who attended the disinformation
meet-up [0] in Wikimania 2019 [1] or others who may be interested.
* The notes from our meet-up are now posted in the bottom of the
page
[0].
* I was tasked to see if
space.wmflabs.org is the place for us to
continue conversations about this topic. The answer is yes. Thanks
to the help of Elena Lappen, we now have a dedicated subcategory for
disinformation:
https://discuss-space.wmflabs.org/c/research/disinformation . Feel
free to subscribe, watch, and/or post new topics if you're involved
in this space.
* If you are new to this conversation, please read the purpose of
the subcategory at
https://discuss-space.wmflabs.org/t/about-the-disinformation-category/
949
and welcome! :)
Best,
Leila
[0]
https://wikimania.wikimedia.org/wiki/2019:Meetups/Disinformation
[1]
https://wikimania.wikimedia.org/wiki/2019:Program
------------------------------
Message: 4
Date: Thu, 22 Aug 2019 22:43:53 +0000 (UTC)
From: Mohammed Sadat Abdulai <masssly(a)ymail.com>
To: Research Into Wikimedia Content and Communities
<wiki-research-l(a)lists.wikimedia.org>
Subject: [Wiki-research-l] Upcoming Research Newsletter (special issue
on gender gap research): New papers open for review
Message-ID: <1625269943.668598.1566513833343(a)mail.yahoo.com>
Content-Type: text/plain; charset=UTF-8
Hi everyone,
We’re preparing for the August 2019 research newsletter and looking
for contributors. Please take a look at
https://etherpad.wikimedia.org/p/WRN201908 and add your name next to
any paper you are interested in covering. Our target publication
date is on
31
August 11:59 UTC. As usual, short notes and
one-paragraph reviews
are
most
welcome.
For the August edition, we are planning a special issue focusing
mainly on recent gender gap/gender bias research. (Upcoming special
issues
topics
may include health and education.) There are
about 20 papers from
this
area
on our todo list which will all be covered in the
August issue,
either
as a
mere list item or - with your help - in form of a
more informative
writeup
or review. They include:
- Analyzing Gender Stereotyping in Bollywood Movies
- Breaking the glass ceiling on Wikipedia| journal
- Breastfeeding, Authority, and Genre: Women's Ethos in Wikipedia
and Blogs
- Cyberfeminism on Wikipedia: Visibility and deliberation in
feminist Wikiprojects
- Gender and deletion on Wikipedia
- Gender imbalance and Wikipedia
- Gender Markers in Wikipedia Usernames
- How do students trust Wikipedia? An examination across genders
- Investigating the Gender Pronoun Gap in Wikipedia
- It’s Not What You Think: Gender Bias in Information about
Fortune
1000 CEOs on Wikipedia
- Mapping and Bridging the Gender Gap: An Ethnographic Study of
Indian Wikipedians and Their Motivations to Contribute
- People Who Can Take It: How Women Wikipedians Negotiate and
Navigate Safety
- Redressing Gender Inequities on Wikipedia Through an Editathon
- Similar Gaps, Different Origins? Women Readers and Editors at
Greek Wikipedia
- Simulation Experiments on (the Absence of) Ratings Bias in
Reputation
Systems
- The Gendered Presentation of Professions on Wikipedia
- Who Counts as a Notable Sociologist on Wikipedia? Gender, Race,
and the “Professor Test”
- Who Wants to Read This?: A Method for Measuring Topical
Representativeness in User Generated Content Systems
- Women and Wikipedia. Diversifying Editors and Enhancing Content
through Library Edit-a-Thons
Masssly and Tilman Bayer
[1] Research:Newsletter - Meta[2] WikiResearch (@WikiResearch) on
Twitter
------------------------------
Subject: Digest Footer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
End of Wiki-research-l Digest, Vol 168, Issue 12
************************************************
------------------------------
Message: 2
Date: Fri, 23 Aug 2019 14:41:09 +1000
From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
To: "'Research into Wikimedia content and communities'"
<wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Message-ID: <001001d5596c$fe22a100$fa67e300$(a)gmail.com>
Content-Type: text/plain; charset="utf-8"
Yes, that was my thought. It would be difficult to know the sex (or
the
gender) of an author name on a paper. There would inevitably be a lot
that you could not determine. And certainly in the sciences
multi-author pages are the norm and even where you did know the
sex/gender of all, do you assign some part-score? E.g. 0 for all male,
1 for all female, 0.6 for 3 women and 2 men.
But I am curious why you are asking the question? That the
writing/research of women is under-represented in Wikipedia citations?
If so, without conducting any research, I'd say "yes it is
under-represented".
But my reason would be because women are
under-represented as
writers/researchers in the first place. And certainly the older the
source, the more likely it is to be written by a man. So to
investigate gender bias in citations in Wikipedia, you would have to
estimate the proportion of men/women (or at least their outputs) over
time in a given discipline and then ask the question, "taking into
account of the time of publication of a citation and the proportion of
men/women active in this discipline at that time, do Wikipedia citations
show a
sex/gender basis?".
Hmm ... very tricky.
I'd be inclined to suggest starting with a much simpler task. Pick a
discipline (preferably one with a professional society who can tell
your their estimate of current male/female ratio over (say) the past 5
years), limit the Wikipedia articles to topics in that discipline, and
limit the citations to those published within the last 5 years.
Indeed, perhaps limiting it to publications that are principally from
the same country(s) as the professional society from which you get the
data (as clearly men/women's participation in any discipline can vary
with different countries for cultural reasons). Then you have some way
to gauge whether Wikipedia is showing more or less gender bias in its
citations than the discipline itself exhibits through publication. Quite
a
challenge!
And of course, it is not Wikipedia that adds citations. It is
individual contributor who add citations. Does the sex/gender of the
contributor have any correlation to any observed bias? Again, the task
is made more difficult because a lot of Wikipedians don't identify their
sex/gender.
The other thing to be alert to is the difference in how (I believe)
Wikipedians cite compared to researchers. As a researcher, I will of
course be reading papers in my field all the time and what I read will
influence my subsequent work. Therefore when I write about my
research, my citations are referring to papers that I have already
read and whose authors may be familiar to me from their other work,
having met them at a conferences, private correspondence, etc. However
as a Wikipedian, I am only partially operating that way (mostly when I
write new articles or significantly expand them, that is, when I am
doing the research). A lot of the time I am adding citations relating
to content other people (often new users) have added/changed without
citation.
These come up on my watchlist all the time.
What do I do? Of course I could revert saying
"no citation provided",
but that's not the way to encourage new contributors nor to grow the
encyclopedia, so if the information seems plausible (not obviously
vandalism), I will attempt to find a citation for it (using tools like
Google and other topic-specialise search tools). This is what I call
"lucky dip" mode of citing as obviously I have no idea what the source
was for the original contributor. The sources I find from my search
may not already be known to me (frequently they are not). Or to
summarise, IMHO, researchers (or Wikipedians in "new content mode")
cite a source already known to them and whose authors may be known to
them and could consciously or unconsciously engage in some
discrimination in citation based on sex/gender or other criteria,
whereas Wikipedians in "updating mode" are likely to be citing a
source not previously known to them and may be happy just to have
found a source and are unlikely to be spending a lot of their time
researching the authors of that source to be extent they could then
consciously or unconsciously exercise discrimination on sex/gender. If
I invest any extra effort in such a situations, it's probably because
the wording of the source is a close match to the Wikipedia article
which begs the question of copyright violation (which needs to be
dealt with by deletion or rewriting) or being a Wikipedia mirror (which
is
obviously not an acceptable citation).
So I suspect whether a citation was added by the same contributor as
the content it supports or a subsequent contributor probably makes a
difference to the likelihood of conscious/unconscious discrimination.
Also, finally, often Wikipedia cites web pages and other sources that
do not have any individual authorship, e.g. government websites.
Remember that Wikipedia prefers open citations over paywalled
citations and a lot of the publications behind paywalls are individually
authored.
Your proposed research has a lot of interesting challenges and a
number of limitations. I'm not saying don't do it, but I am saying
start very small and see if you can find any evidence to support your
hypothesis before embarking on a larger study. Because contributor
behaviour is what you are trying to study, you probably need to do
both quantitative and qualitative experiments. E.g. I have described
the two modes of citation I do, but I cannot say how typical my
behaviour is.
Kerry
-----Original Message-----
From: Wiki-research-l
[mailto:wiki-research-l-bounces@lists.wikimedia.org]
On Behalf Of Leila Zia
Sent: Friday, 23 August 2019 3:44 AM
To: Research into Wikimedia content and communities <
wiki-research-l(a)lists.wikimedia.org>
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
Hi Greg,
A few comments if you're going to go with "proportion of male vs
female authors of the source material used as citations in arbitrary
articles":
* Please differentiate between sex (female, male, ...) and gender
(woman, man, ...). My understanding from your initial email is that
you want to stay focused on gender, not sex.
* Unless you have reliable sources about the gender of an author, I
would not recommend trying to predict what the gender is. (As you may
know, this is not uncommon in social media studies, for example, to
predict the gender of the author based on their image or their name.
These approaches introduce biases and social challenges.)
* Re your question about whether WMF has resources to look into this
question in-house: I can't speak for the whole of WMF, however, I can
share more about the Research team's direction. As part of our future
work, we would like to "help contributors monitor violations of core
content policies and assess information reliability and bias both
granularly and at scale". [1] The question you proposed can fall under
assessing bias in content (considering citations as part of the
content). I expect us to focus first on the piece about violations of
core content policies and information reliability and come back to the
bias question later. As a result, we won't have bandwidth to do your
proposal
in-house at the moment.
Sorry about that.
I hope this helps.
Best,
Leila
[1] Section 2 of our Knowledge Integrity whitepaper:
https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrit
y_-_Wikimedia_Research_2030.pdf
On Thu, Aug 22, 2019 at 9:57 AM Greg <thenatureprogram(a)gmail.com> wrote:
Hi Kerry,
Those are all very interesting ways to look at this. I was thinking
mostly along the lines of your first bullet point, but I'd be
interested in research in any of those areas.
Thanks,
Greg
On Thu, Aug 22, 2019 at 5:00 AM
<wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more
> specific than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. gender balance of wikipedia citations (Greg)
> 2. Re: gender balance of wikipedia citations (Kerry Raymond)
>
>
> ------------------------------------------------------------------
> --
> --
>
> Message: 1
> Date: Wed, 21 Aug 2019 20:19:18 -0700
> From: Greg <thenatureprogram(a)gmail.com>
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
> <
> CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.com
> CAOO9DNtY+>
> Content-Type: text/plain; charset="UTF-8"
>
> Greetings!
>
> I was looking for information about the gender balance of
> Wikipedia citations and no one I've asked knows of any work on
> this topic. Do
you?
>
> I think this is an important question.
>
> Here's what I've learned so far:
>
> Wikipedia citations are currently in the form of text strings.
> There is also an initiative to place citations in an annotated
> structured repository (wikicite). I do not know the current status
> of wikicite or if/when this could be used for this inquiry--either
> to examine all, or a sensible subset of the citations.
>
> My perspective is that understanding the gender balance is
> necessary and urgent. The balance could be better, the same, or
> worse than the citation balances we already know, and the scale of
> the
effect is quite large.
>
> Is this a line of inquiry that the wikimedia/wikicite community is
> interested in pursuing? If so, what is the best way to get started?
> Does the WMF have the resources and interest to look into this
> matter
inhouse?
>
> Thanks for your thoughts.
>
> Greg
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 22 Aug 2019 13:53:45 +1000
> From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> To: "'Research into Wikimedia content and communities'"
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia
> citations
> Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Could you elaborate a bit more on what you mean by the gender
> balance of citations?
>
> Are you talking about:
>
> * proportion of male vs female authors of the source material used
> as citations in arbitrary articles>
> * the quality/quantity of citations in biography articles of men
> vs
women?
> * the quality/quantity of citations in
articles that are gendered
> by some other criteria (e.g. reader interest, romantic comedy vs
> action
film)?
>
> Kerry
>
> -----Original Message-----
> From: Wiki-research-l
> [mailto:wiki-research-l-bounces@lists.wikimedia.org]
> On Behalf Of Greg
> Sent: Thursday, 22 August 2019 1:19 PM
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: [Wiki-research-l] gender balance of wikipedia citations
>
> Greetings!
>
> I was looking for information about the gender balance of
> Wikipedia citations and no one I've asked knows of any work on
> this topic. Do
you?
>
> I think this is an important question.
>
> Here's what I've learned so far:
>
> Wikipedia citations are currently in the form of text strings.
> There is also an initiative to place citations in an annotated
> structured repository (wikicite). I do not know the current status
> of wikicite or if/when this could be used for this inquiry--either
> to examine all, or a sensible subset of the citations.
>
> My perspective is that understanding the gender balance is
> necessary and urgent. The balance could be better, the same, or
> worse than the citation balances we already know, and the scale of
> the
effect is quite large.
>
> Is this a line of inquiry that the wikimedia/wikicite community is
> interested in pursuing? If so, what is the best way to get started?
> Does the WMF have the resources and interest to look into this
> matter
inhouse?
Thanks for your thoughts.
Greg
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Subject: Digest Footer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
End of Wiki-research-l Digest, Vol 168, Issue 11
************************************************
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Subject: Digest Footer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
End of Wiki-research-l Digest, Vol 168, Issue 13
************************************************
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
Subject: Digest Footer
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
------------------------------
End of Wiki-research-l Digest, Vol 168, Issue 17
************************************************