Wow, Kerry! Thank you for taking the time to write all these thoughts out.
I'm asking the question because I'm concerned that the gender balance of
the authors being cited on wikipedia is different from the already quite
bad patterns in academia. My fear is that the citation gender imbalance on
Wikipedia is more pronounced. If so, it is not just perpetuating the
problem, but making it worse by surfacing certain authors and ideas even
more frequently, or hardly at all. I would like to know if this is the
case, and if so, how big the effect is.
In my last message, I mention a study about a set of award-winning
political science books (the researchers study the citation gender
imbalance for that set). I just saw this study today, but I began to think
that it/the set of works--or some similar set of titles--could possibly be
a good place to begin, especially if the original researchers were willing
to share the list of titles/authors/gender/etc that they put
together/worked with. Then it seems it would mostly be a matter of figuring
out how to understand how those titles are cited on Wikipedia--through
either the citation dataset or wikicite--to see if/how the citation
patterns differ (i.e., if the works by women/men are cited more
frequently/at the same rate/less frequently on Wikipedia than what the
researchers found in the original study).
This seems like it would be easier to do than what you propose, but perhaps
the idea is not sound. Until very recently, I thought I could find the
answer in an existing paper! I honestly don't know the best way to get the
answer, but I would like to know the answer and think it's important to
look at.
All of the things you bring up--from the gender of the editor, to the type
of editing being done, to the issues around multiple authors/paywalls/year
of publication/field--complicate the inquiry, and in particular a larger
one. I agree with what you say about doing something small first to see
what's there.
Thanks again for all your thoughts.
Greg
On Thu, Aug 22, 2019 at 9:41 PM <wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Re: gender balance of wikipedia citations (Greg)
> 2. Re: gender balance of wikipedia citations (Kerry Raymond)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 22 Aug 2019 18:47:48 -0700
> From: Greg <thenatureprogram(a)gmail.com>
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
> <
> CAOO9DNvBrw_aLkRUp5kYFLdaLJUEK+ddiz-A09MZwiotAdAmUw(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Leila,
>
> Thanks for your thoughts.
>
> Having just read Troy Vettese's very powerful essay, Sexism in the Academy
> (
> https://nplusonemag.com/issue-34/essays/sexism-in-the-academy/), I wish
> this were a top priority.
>
> I stumbled upon a study today--it came up in the Washington Post's
> excellent series on gender bias in political science. The authors look at a
> set of award winning political science books and the gender imbalance in
> the citations drawn from google scholar. I'm linking the piece here in
> case anyone on this list is interested now, or in the future, in how the
> patterns on Wikipedia compare.
>
> Washington Post piece: "There’s a gender gap in who wins political science
> book awards – and in how widely they’re cited"
>
> https://www.washingtonpost.com/politics/2019/08/22/theres-gender-gap-who-wi…
> "Just as significantly, women’s award-winning books receive fewer scholarly
> citations than men’s award-winning volumes — and this disparity has grown,
> rather than shrunk, in recent years. Over the entire period, APSA
> award-winning volumes by women averaged 43 percent fewer citations per year
> than those by male authors."
>
> Paper: "Winning awards and gaining recognition: An impact analysis of APSA
> section book prizes"
> https://www.sciencedirect.com/science/article/abs/pii/S0362331918300867
>
>
> Best,
> Greg
>
> On Thu, Aug 22, 2019 at 3:44 PM <
> wiki-research-l-request(a)lists.wikimedia.org>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> > wiki-research-l(a)lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> > wiki-research-l-request(a)lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > wiki-research-l-owner(a)lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: gender balance of wikipedia citations (Greg)
> > 2. Re: gender balance of wikipedia citations (Leila Zia)
> > 3. Wikimania 2019 disinformation meetup follow-up (Leila Zia)
> > 4. Upcoming Research Newsletter (special issue on gender gap
> > research): New papers open for review (Mohammed Sadat Abdulai)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Thu, 22 Aug 2019 09:57:15 -0700
> > From: Greg <thenatureprogram(a)gmail.com>
> > To: wiki-research-l(a)lists.wikimedia.org
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> > <CAOO9DNuSYzzaVwcdqiWA7pj671z3N43XOSwv6DtW0SxWg=
> > L8GQ(a)mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi Kerry,
> > Those are all very interesting ways to look at this. I was thinking
> mostly
> > along the lines of your first bullet point, but I'd be interested in
> > research in any of those areas.
> >
> > Thanks,
> > Greg
> >
> > On Thu, Aug 22, 2019 at 5:00 AM <
> > wiki-research-l-request(a)lists.wikimedia.org>
> > wrote:
> >
> > > Send Wiki-research-l mailing list submissions to
> > > wiki-research-l(a)lists.wikimedia.org
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > or, via email, send a message with subject or body 'help' to
> > > wiki-research-l-request(a)lists.wikimedia.org
> > >
> > > You can reach the person managing the list at
> > > wiki-research-l-owner(a)lists.wikimedia.org
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Wiki-research-l digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > > 1. gender balance of wikipedia citations (Greg)
> > > 2. Re: gender balance of wikipedia citations (Kerry Raymond)
> > >
> > >
> > > ----------------------------------------------------------------------
> > >
> > > Message: 1
> > > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > > From: Greg <thenatureprogram(a)gmail.com>
> > > To: wiki-research-l(a)lists.wikimedia.org
> > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID:
> > > <
> > > CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Greetings!
> > >
> > > I was looking for information about the gender balance of Wikipedia
> > > citations and no one I've asked knows of any work on this topic. Do
> you?
> > >
> > > I think this is an important question.
> > >
> > > Here's what I've learned so far:
> > >
> > > Wikipedia citations are currently in the form of text strings. There is
> > > also an initiative to place citations in an annotated structured
> > repository
> > > (wikicite). I do not know the current status of wikicite or if/when
> this
> > > could be used for this inquiry--either to examine all, or a sensible
> > subset
> > > of the citations.
> > >
> > > My perspective is that understanding the gender balance is necessary
> and
> > > urgent. The balance could be better, the same, or worse than the
> citation
> > > balances we already know, and the scale of the effect is quite large.
> > >
> > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > interested in pursuing? If so, what is the best way to get started?
> Does
> > > the WMF have the resources and interest to look into this matter
> inhouse?
> > >
> > > Thanks for your thoughts.
> > >
> > > Greg
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 2
> > > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > > From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> > > To: "'Research into Wikimedia content and communities'"
> > > <wiki-research-l(a)lists.wikimedia.org>
> > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Could you elaborate a bit more on what you mean by the gender balance
> of
> > > citations?
> > >
> > > Are you talking about:
> > >
> > > * proportion of male vs female authors of the source material used as
> > > citations in arbitrary articles>
> > > * the quality/quantity of citations in biography articles of men vs
> > women?
> > > * the quality/quantity of citations in articles that are gendered by
> some
> > > other criteria (e.g. reader interest, romantic comedy vs action film)?
> > >
> > > Kerry
> > >
> > > -----Original Message-----
> > > From: Wiki-research-l [mailto:
> > wiki-research-l-bounces(a)lists.wikimedia.org]
> > > On Behalf Of Greg
> > > Sent: Thursday, 22 August 2019 1:19 PM
> > > To: wiki-research-l(a)lists.wikimedia.org
> > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > >
> > > Greetings!
> > >
> > > I was looking for information about the gender balance of Wikipedia
> > > citations and no one I've asked knows of any work on this topic. Do
> you?
> > >
> > > I think this is an important question.
> > >
> > > Here's what I've learned so far:
> > >
> > > Wikipedia citations are currently in the form of text strings. There is
> > > also an initiative to place citations in an annotated structured
> > repository
> > > (wikicite). I do not know the current status of wikicite or if/when
> this
> > > could be used for this inquiry--either to examine all, or a sensible
> > subset
> > > of the citations.
> > >
> > > My perspective is that understanding the gender balance is necessary
> and
> > > urgent. The balance could be better, the same, or worse than the
> citation
> > > balances we already know, and the scale of the effect is quite large.
> > >
> > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > interested in pursuing? If so, what is the best way to get started?
> Does
> > > the WMF have the resources and interest to look into this matter
> inhouse?
> > >
> > > Thanks for your thoughts.
> > >
> > > Greg
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > > ------------------------------
> > >
> > > End of Wiki-research-l Digest, Vol 168, Issue 11
> > > ************************************************
> > >
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Thu, 22 Aug 2019 10:43:51 -0700
> > From: Leila Zia <leila(a)wikimedia.org>
> > To: Research into Wikimedia content and communities
> > <wiki-research-l(a)lists.wikimedia.org>
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> > <CAK0Oe2uCo70_=ma2b=2d+fvr4GseEVxOP0sh=
> > ELNOpKdCuUfqA(a)mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi Greg,
> >
> > A few comments if you're going to go with "proportion of male vs
> > female authors of the source material used as citations in arbitrary
> > articles":
> >
> > * Please differentiate between sex (female, male, ...) and gender
> > (woman, man, ...). My understanding from your initial email is that
> > you want to stay focused on gender, not sex.
> >
> > * Unless you have reliable sources about the gender of an author, I
> > would not recommend trying to predict what the gender is. (As you may
> > know, this is not uncommon in social media studies, for example, to
> > predict the gender of the author based on their image or their name.
> > These approaches introduce biases and social challenges.)
> >
> > * Re your question about whether WMF has resources to look into this
> > question in-house: I can't speak for the whole of WMF, however, I can
> > share more about the Research team's direction. As part of our future
> > work, we would like to "help contributors monitor violations of core
> > content policies and assess information reliability and bias both
> > granularly and at scale". [1] The question you proposed can fall under
> > assessing bias in content (considering citations as part of the
> > content). I expect us to focus first on the piece about violations of
> > core content policies and information reliability and come back to the
> > bias question later. As a result, we won't have bandwidth to do your
> > proposal in-house at the moment. Sorry about that.
> >
> > I hope this helps.
> >
> > Best,
> > Leila
> >
> > [1] Section 2 of our Knowledge Integrity whitepaper:
> >
> >
> https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_W…
> >
> >
> > On Thu, Aug 22, 2019 at 9:57 AM Greg <thenatureprogram(a)gmail.com> wrote:
> > >
> > > Hi Kerry,
> > > Those are all very interesting ways to look at this. I was thinking
> > mostly
> > > along the lines of your first bullet point, but I'd be interested in
> > > research in any of those areas.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Thu, Aug 22, 2019 at 5:00 AM <
> > wiki-research-l-request(a)lists.wikimedia.org>
> > > wrote:
> > >
> > > > Send Wiki-research-l mailing list submissions to
> > > > wiki-research-l(a)lists.wikimedia.org
> > > >
> > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > or, via email, send a message with subject or body 'help' to
> > > > wiki-research-l-request(a)lists.wikimedia.org
> > > >
> > > > You can reach the person managing the list at
> > > > wiki-research-l-owner(a)lists.wikimedia.org
> > > >
> > > > When replying, please edit your Subject line so it is more specific
> > > > than "Re: Contents of Wiki-research-l digest..."
> > > >
> > > >
> > > > Today's Topics:
> > > >
> > > > 1. gender balance of wikipedia citations (Greg)
> > > > 2. Re: gender balance of wikipedia citations (Kerry Raymond)
> > > >
> > > >
> > > >
> ----------------------------------------------------------------------
> > > >
> > > > Message: 1
> > > > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > > > From: Greg <thenatureprogram(a)gmail.com>
> > > > To: wiki-research-l(a)lists.wikimedia.org
> > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID:
> > > > <
> > > > CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.com>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Greetings!
> > > >
> > > > I was looking for information about the gender balance of Wikipedia
> > > > citations and no one I've asked knows of any work on this topic. Do
> > you?
> > > >
> > > > I think this is an important question.
> > > >
> > > > Here's what I've learned so far:
> > > >
> > > > Wikipedia citations are currently in the form of text strings. There
> is
> > > > also an initiative to place citations in an annotated structured
> > repository
> > > > (wikicite). I do not know the current status of wikicite or if/when
> > this
> > > > could be used for this inquiry--either to examine all, or a sensible
> > subset
> > > > of the citations.
> > > >
> > > > My perspective is that understanding the gender balance is necessary
> > and
> > > > urgent. The balance could be better, the same, or worse than the
> > citation
> > > > balances we already know, and the scale of the effect is quite large.
> > > >
> > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > interested in pursuing? If so, what is the best way to get started?
> > Does
> > > > the WMF have the resources and interest to look into this matter
> > inhouse?
> > > >
> > > > Thanks for your thoughts.
> > > >
> > > > Greg
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > > > From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> > > > To: "'Research into Wikimedia content and communities'"
> > > > <wiki-research-l(a)lists.wikimedia.org>
> > > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Could you elaborate a bit more on what you mean by the gender balance
> > of
> > > > citations?
> > > >
> > > > Are you talking about:
> > > >
> > > > * proportion of male vs female authors of the source material used as
> > > > citations in arbitrary articles>
> > > > * the quality/quantity of citations in biography articles of men vs
> > women?
> > > > * the quality/quantity of citations in articles that are gendered by
> > some
> > > > other criteria (e.g. reader interest, romantic comedy vs action
> film)?
> > > >
> > > > Kerry
> > > >
> > > > -----Original Message-----
> > > > From: Wiki-research-l [mailto:
> > wiki-research-l-bounces(a)lists.wikimedia.org]
> > > > On Behalf Of Greg
> > > > Sent: Thursday, 22 August 2019 1:19 PM
> > > > To: wiki-research-l(a)lists.wikimedia.org
> > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > >
> > > > Greetings!
> > > >
> > > > I was looking for information about the gender balance of Wikipedia
> > > > citations and no one I've asked knows of any work on this topic. Do
> > you?
> > > >
> > > > I think this is an important question.
> > > >
> > > > Here's what I've learned so far:
> > > >
> > > > Wikipedia citations are currently in the form of text strings. There
> is
> > > > also an initiative to place citations in an annotated structured
> > repository
> > > > (wikicite). I do not know the current status of wikicite or if/when
> > this
> > > > could be used for this inquiry--either to examine all, or a sensible
> > subset
> > > > of the citations.
> > > >
> > > > My perspective is that understanding the gender balance is necessary
> > and
> > > > urgent. The balance could be better, the same, or worse than the
> > citation
> > > > balances we already know, and the scale of the effect is quite large.
> > > >
> > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > interested in pursuing? If so, what is the best way to get started?
> > Does
> > > > the WMF have the resources and interest to look into this matter
> > inhouse?
> > > >
> > > > Thanks for your thoughts.
> > > >
> > > > Greg
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l(a)lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Subject: Digest Footer
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l(a)lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > End of Wiki-research-l Digest, Vol 168, Issue 11
> > > > ************************************************
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Thu, 22 Aug 2019 13:36:17 -0700
> > From: Leila Zia <leila(a)wikimedia.org>
> > To: Research into Wikimedia content and communities
> > <wiki-research-l(a)lists.wikimedia.org>
> > Subject: [Wiki-research-l] Wikimania 2019 disinformation meetup
> > follow-up
> > Message-ID:
> > <CAK0Oe2sodYJpkuhSqgo3dtfDr=
> > NQ5EK1TdH16F6BOkTyFho9Rg(a)mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi,
> >
> > This message is for those of you who attended the disinformation
> > meet-up [0] in Wikimania 2019 [1] or others who may be interested.
> >
> > * The notes from our meet-up are now posted in the bottom of the page
> [0].
> >
> > * I was tasked to see if space.wmflabs.org is the place for us to
> > continue conversations about this topic. The answer is yes. Thanks to
> > the help of Elena Lappen, we now have a dedicated subcategory for
> > disinformation:
> > https://discuss-space.wmflabs.org/c/research/disinformation . Feel
> > free to subscribe, watch, and/or post new topics if you're involved in
> > this space.
> >
> > * If you are new to this conversation, please read the purpose of the
> > subcategory at
> >
> https://discuss-space.wmflabs.org/t/about-the-disinformation-category/949
> > and welcome! :)
> >
> > Best,
> > Leila
> >
> > [0] https://wikimania.wikimedia.org/wiki/2019:Meetups/Disinformation
> > [1] https://wikimania.wikimedia.org/wiki/2019:Program
> >
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Thu, 22 Aug 2019 22:43:53 +0000 (UTC)
> > From: Mohammed Sadat Abdulai <masssly(a)ymail.com>
> > To: Research Into Wikimedia Content and Communities
> > <wiki-research-l(a)lists.wikimedia.org>
> > Subject: [Wiki-research-l] Upcoming Research Newsletter (special issue
> > on gender gap research): New papers open for review
> > Message-ID: <1625269943.668598.1566513833343(a)mail.yahoo.com>
> > Content-Type: text/plain; charset=UTF-8
> >
> > Hi everyone,
> > We’re preparing for the August 2019 research newsletter and looking for
> > contributors. Please take a look at
> > https://etherpad.wikimedia.org/p/WRN201908 and add your name next to any
> > paper you are interested in covering. Our target publication date is on
> 31
> > August 11:59 UTC. As usual, short notes and one-paragraph reviews are
> most
> > welcome.
> > For the August edition, we are planning a special issue focusing mainly
> > on recent gender gap/gender bias research. (Upcoming special issues
> topics
> > may include health and education.) There are about 20 papers from this
> area
> > on our todo list which will all be covered in the August issue, either
> as a
> > mere list item or - with your help - in form of a more informative
> writeup
> > or review. They include:
> > - Analyzing Gender Stereotyping in Bollywood Movies
> >
> > - Breaking the glass ceiling on Wikipedia| journal
> >
> > - Breastfeeding, Authority, and Genre: Women's Ethos in Wikipedia and
> > Blogs
> >
> > - Cyberfeminism on Wikipedia: Visibility and deliberation in feminist
> > Wikiprojects
> >
> > - Gender and deletion on Wikipedia
> >
> > - Gender imbalance and Wikipedia
> >
> > - Gender Markers in Wikipedia Usernames
> >
> > - How do students trust Wikipedia? An examination across genders
> >
> > - Investigating the Gender Pronoun Gap in Wikipedia
> >
> > - It’s Not What You Think: Gender Bias in Information about Fortune
> > 1000 CEOs on Wikipedia
> >
> > - Mapping and Bridging the Gender Gap: An Ethnographic Study of Indian
> > Wikipedians and Their Motivations to Contribute
> >
> > - People Who Can Take It: How Women Wikipedians Negotiate and Navigate
> > Safety
> >
> > - Redressing Gender Inequities on Wikipedia Through an Editathon
> >
> > - Similar Gaps, Different Origins? Women Readers and Editors at Greek
> > Wikipedia
> >
> > - Simulation Experiments on (the Absence of) Ratings Bias in
> Reputation
> > Systems
> >
> > - The Gendered Presentation of Professions on Wikipedia
> >
> > - Who Counts as a Notable Sociologist on Wikipedia? Gender, Race, and
> > the “Professor Test”
> >
> > - Who Wants to Read This?: A Method for Measuring Topical
> > Representativeness in User Generated Content Systems
> >
> > - Women and Wikipedia. Diversifying Editors and Enhancing Content
> > through Library Edit-a-Thons
> >
> > Masssly and Tilman Bayer
> >
> > [1] Research:Newsletter - Meta[2] WikiResearch (@WikiResearch) on Twitter
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> > ------------------------------
> >
> > End of Wiki-research-l Digest, Vol 168, Issue 12
> > ************************************************
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 23 Aug 2019 14:41:09 +1000
> From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> To: "'Research into Wikimedia content and communities'"
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID: <001001d5596c$fe22a100$fa67e300$(a)gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Yes, that was my thought. It would be difficult to know the sex (or the
> gender) of an author name on a paper. There would inevitably be a lot that
> you could not determine. And certainly in the sciences multi-author pages
> are the norm and even where you did know the sex/gender of all, do you
> assign some part-score? E.g. 0 for all male, 1 for all female, 0.6 for 3
> women and 2 men.
>
> But I am curious why you are asking the question? That the
> writing/research of women is under-represented in Wikipedia citations? If
> so, without conducting any research, I'd say "yes it is under-represented".
> But my reason would be because women are under-represented as
> writers/researchers in the first place. And certainly the older the
> source, the more likely it is to be written by a man. So to investigate
> gender bias in citations in Wikipedia, you would have to estimate the
> proportion of men/women (or at least their outputs) over time in a given
> discipline and then ask the question, "taking into account of the time of
> publication of a citation and the proportion of men/women active in this
> discipline at that time, do Wikipedia citations show a sex/gender basis?".
> Hmm ... very tricky.
>
> I'd be inclined to suggest starting with a much simpler task. Pick a
> discipline (preferably one with a professional society who can tell your
> their estimate of current male/female ratio over (say) the past 5 years),
> limit the Wikipedia articles to topics in that discipline, and limit the
> citations to those published within the last 5 years. Indeed, perhaps
> limiting it to publications that are principally from the same country(s)
> as the professional society from which you get the data (as clearly
> men/women's participation in any discipline can vary with different
> countries for cultural reasons). Then you have some way to gauge whether
> Wikipedia is showing more or less gender bias in its citations than the
> discipline itself exhibits through publication. Quite a challenge!
>
> And of course, it is not Wikipedia that adds citations. It is individual
> contributor who add citations. Does the sex/gender of the contributor have
> any correlation to any observed bias? Again, the task is made more
> difficult because a lot of Wikipedians don't identify their sex/gender.
>
> The other thing to be alert to is the difference in how (I believe)
> Wikipedians cite compared to researchers. As a researcher, I will of course
> be reading papers in my field all the time and what I read will influence
> my subsequent work. Therefore when I write about my research, my citations
> are referring to papers that I have already read and whose authors may be
> familiar to me from their other work, having met them at a conferences,
> private correspondence, etc. However as a Wikipedian, I am only partially
> operating that way (mostly when I write new articles or significantly
> expand them, that is, when I am doing the research). A lot of the time I am
> adding citations relating to content other people (often new users) have
> added/changed without citation. These come up on my watchlist all the time.
> What do I do? Of course I could revert saying "no citation provided", but
> that's not the way to encourage new contributors nor to grow the
> encyclopedia, so if the information seems plausible (not obviously
> vandalism), I will attempt to find a citation for it (using tools like
> Google and other topic-specialise search tools). This is what I call "lucky
> dip" mode of citing as obviously I have no idea what the source was for the
> original contributor. The sources I find from my search may not already be
> known to me (frequently they are not). Or to summarise, IMHO, researchers
> (or Wikipedians in "new content mode") cite a source already known to them
> and whose authors may be known to them and could consciously or
> unconsciously engage in some discrimination in citation based on sex/gender
> or other criteria, whereas Wikipedians in "updating mode" are likely to be
> citing a source not previously known to them and may be happy just to have
> found a source and are unlikely to be spending a lot of their time
> researching the authors of that source to be extent they could then
> consciously or unconsciously exercise discrimination on sex/gender. If I
> invest any extra effort in such a situations, it's probably because the
> wording of the source is a close match to the Wikipedia article which begs
> the question of copyright violation (which needs to be dealt with by
> deletion or rewriting) or being a Wikipedia mirror (which is obviously not
> an acceptable citation).
>
> So I suspect whether a citation was added by the same contributor as the
> content it supports or a subsequent contributor probably makes a difference
> to the likelihood of conscious/unconscious discrimination.
>
> Also, finally, often Wikipedia cites web pages and other sources that do
> not have any individual authorship, e.g. government websites. Remember that
> Wikipedia prefers open citations over paywalled citations and a lot of the
> publications behind paywalls are individually authored.
>
> Your proposed research has a lot of interesting challenges and a number of
> limitations. I'm not saying don't do it, but I am saying start very small
> and see if you can find any evidence to support your hypothesis before
> embarking on a larger study. Because contributor behaviour is what you are
> trying to study, you probably need to do both quantitative and qualitative
> experiments. E.g. I have described the two modes of citation I do, but I
> cannot say how typical my behaviour is.
>
> Kerry
>
> -----Original Message-----
> From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org]
> On Behalf Of Leila Zia
> Sent: Friday, 23 August 2019 3:44 AM
> To: Research into Wikimedia content and communities <
> wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
>
> Hi Greg,
>
> A few comments if you're going to go with "proportion of male vs female
> authors of the source material used as citations in arbitrary
> articles":
>
> * Please differentiate between sex (female, male, ...) and gender (woman,
> man, ...). My understanding from your initial email is that you want to
> stay focused on gender, not sex.
>
> * Unless you have reliable sources about the gender of an author, I would
> not recommend trying to predict what the gender is. (As you may know, this
> is not uncommon in social media studies, for example, to predict the gender
> of the author based on their image or their name.
> These approaches introduce biases and social challenges.)
>
> * Re your question about whether WMF has resources to look into this
> question in-house: I can't speak for the whole of WMF, however, I can share
> more about the Research team's direction. As part of our future work, we
> would like to "help contributors monitor violations of core content
> policies and assess information reliability and bias both granularly and at
> scale". [1] The question you proposed can fall under assessing bias in
> content (considering citations as part of the content). I expect us to
> focus first on the piece about violations of core content policies and
> information reliability and come back to the bias question later. As a
> result, we won't have bandwidth to do your proposal in-house at the moment.
> Sorry about that.
>
> I hope this helps.
>
> Best,
> Leila
>
> [1] Section 2 of our Knowledge Integrity whitepaper:
>
> https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_W…
>
>
> On Thu, Aug 22, 2019 at 9:57 AM Greg <thenatureprogram(a)gmail.com> wrote:
> >
> > Hi Kerry,
> > Those are all very interesting ways to look at this. I was thinking
> > mostly along the lines of your first bullet point, but I'd be
> > interested in research in any of those areas.
> >
> > Thanks,
> > Greg
> >
> > On Thu, Aug 22, 2019 at 5:00 AM
> > <wiki-research-l-request(a)lists.wikimedia.org>
> > wrote:
> >
> > > Send Wiki-research-l mailing list submissions to
> > > wiki-research-l(a)lists.wikimedia.org
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > or, via email, send a message with subject or body 'help' to
> > > wiki-research-l-request(a)lists.wikimedia.org
> > >
> > > You can reach the person managing the list at
> > > wiki-research-l-owner(a)lists.wikimedia.org
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Wiki-research-l digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > > 1. gender balance of wikipedia citations (Greg)
> > > 2. Re: gender balance of wikipedia citations (Kerry Raymond)
> > >
> > >
> > > --------------------------------------------------------------------
> > > --
> > >
> > > Message: 1
> > > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > > From: Greg <thenatureprogram(a)gmail.com>
> > > To: wiki-research-l(a)lists.wikimedia.org
> > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID:
> > > <
> > > CAOO9DNtY+oDO5oQrMZeG1NZE-kYNYLWnTD6acHeYTbYeGk8k2Q(a)mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Greetings!
> > >
> > > I was looking for information about the gender balance of Wikipedia
> > > citations and no one I've asked knows of any work on this topic. Do
> you?
> > >
> > > I think this is an important question.
> > >
> > > Here's what I've learned so far:
> > >
> > > Wikipedia citations are currently in the form of text strings. There
> > > is also an initiative to place citations in an annotated structured
> > > repository (wikicite). I do not know the current status of wikicite
> > > or if/when this could be used for this inquiry--either to examine
> > > all, or a sensible subset of the citations.
> > >
> > > My perspective is that understanding the gender balance is
> > > necessary and urgent. The balance could be better, the same, or
> > > worse than the citation balances we already know, and the scale of the
> effect is quite large.
> > >
> > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > interested in pursuing? If so, what is the best way to get started?
> > > Does the WMF have the resources and interest to look into this matter
> inhouse?
> > >
> > > Thanks for your thoughts.
> > >
> > > Greg
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 2
> > > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > > From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> > > To: "'Research into Wikimedia content and communities'"
> > > <wiki-research-l(a)lists.wikimedia.org>
> > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$(a)gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Could you elaborate a bit more on what you mean by the gender
> > > balance of citations?
> > >
> > > Are you talking about:
> > >
> > > * proportion of male vs female authors of the source material used
> > > as citations in arbitrary articles>
> > > * the quality/quantity of citations in biography articles of men vs
> women?
> > > * the quality/quantity of citations in articles that are gendered by
> > > some other criteria (e.g. reader interest, romantic comedy vs action
> film)?
> > >
> > > Kerry
> > >
> > > -----Original Message-----
> > > From: Wiki-research-l
> > > [mailto:wiki-research-l-bounces@lists.wikimedia.org]
> > > On Behalf Of Greg
> > > Sent: Thursday, 22 August 2019 1:19 PM
> > > To: wiki-research-l(a)lists.wikimedia.org
> > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > >
> > > Greetings!
> > >
> > > I was looking for information about the gender balance of Wikipedia
> > > citations and no one I've asked knows of any work on this topic. Do
> you?
> > >
> > > I think this is an important question.
> > >
> > > Here's what I've learned so far:
> > >
> > > Wikipedia citations are currently in the form of text strings. There
> > > is also an initiative to place citations in an annotated structured
> > > repository (wikicite). I do not know the current status of wikicite
> > > or if/when this could be used for this inquiry--either to examine
> > > all, or a sensible subset of the citations.
> > >
> > > My perspective is that understanding the gender balance is
> > > necessary and urgent. The balance could be better, the same, or
> > > worse than the citation balances we already know, and the scale of the
> effect is quite large.
> > >
> > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > interested in pursuing? If so, what is the best way to get started?
> > > Does the WMF have the resources and interest to look into this matter
> inhouse?
> > >
> > > Thanks for your thoughts.
> > >
> > > Greg
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > > ------------------------------
> > >
> > > End of Wiki-research-l Digest, Vol 168, Issue 11
> > > ************************************************
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 13
> ************************************************
>
Thanks once again for writing up your thoughts, Kerry. All very interesting.
Your comment about 'reflection of the real world' caught my eye. I believe
that the real world is moving towards acknowledging that bias exists and
that it won't just go away on its own. I see web-based tools for assessing
the gender balance of citations; I see people studying bias in things like
hiring and promotion, as well as different strategies for addressing it; I
see organizations like VIDA (https://www.vidaweb.org/) counting the number
of female writers in different journals, and journals responding because
the imbalance is known and public. I think the real world is moving towards
acknowledging and proactively addressing inequity. If the Wikipedia
community is not studying its biases and designing tools and strategies for
addressing them, it is not reflecting the world, but lagging behind it.
Frankly, if a few shouting misogynist have a problem with such initiatives,
I don't mind :) It sounds like my citation-tag idea doesn't make too much
sense, but I'd love to hear any other thoughts.
Greg
On Wed, Aug 28, 2019 at 3:27 PM <wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Re: gender balance of Wikipedia citations (Kerry Raymond)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 29 Aug 2019 08:26:45 +1000
> From: "Kerry Raymond" <kerry.raymond(a)gmail.com>
> To: "'Research into Wikimedia content and communities'"
> <wiki-research-l(a)lists.wikimedia.org>, <jane023(a)gmail.com>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID: <006701d55def$aea84d50$0bf8e7f0$(a)gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> FWIW, I think there would be pushback against a quality tag that
> highlighted little/no citation of women's work (whether we are talking
> first author or not) in an article. There's two reasons for this. One is
> the misogyny that really does exist within the English Wikipedia
> "community" (those who do most of the shouting and hence decision making);
> they will argue that firstly gender balance of citations doesn't matter,
> secondly it is a reflection of the real world and thirdly that Wikipedia
> has a policy that it is not there to Right Great Wrongs.
>
> More practically, we know that whole-of-article quality tagging doesn't
> tend to have a lot of impact in terms getting people to fix anything,
> compared to more specific tags like "citation needed", "dubious", "says
> who" and so on placed on specific pieces of text. People are much more
> likely to fix a specific problem and then remove the specific tag. Even
> when a person does respond to a generic tag like "more references needed"
> and add in some more references, they rarely remove the generic tag
> thinking "well, there's still plenty of scope here to add more references".
> Who among us is willing to declare "that article is 100% fully referenced
> by reliable sources"? Nobody it seems, it's a tag that lingers forever ...
>
> So I think a specific tag to encourage the expansion of "Bloggs et al"
> citations to full author listings might work. It's a somewhat boring and
> mechanical task to expand "et al" but we do have people who are happy to
> contribute in that way. It might even be possible to build a tool to assist
> them which looks up the paper in WikiCite or Google Scholar etc to extract
> the full author list as published (just as we have tools to make it easier
> to typo and spelling fixes, disambiguate links and so forth). That would
> address the problem of women authors not being first cited and lost in the
> mists of "et al". However, as it is unlikely to be obvious to the average
> contributor that the paper with the full author list of A.B. Brown, C.D.
> Jones, E.F. Smith and G.H. Walker does or doesn't have any female authors,
> so I can't see that it's going to be easy to motivate people to try to find
> additional citations which do have more female authors.
>
> And, as much as gender equity is a wrong I'd like to see rightened, I
> don't want to see campaigns just to "add in more female authored citations"
> (I call this "citation sprinkling") on Wikipedia. A citation has to be
> there because it verifies the information in the article and not to meet a
> gender quota. Remember that for a lot of Wikipedia contributors, academic
> literature is mostly behind a paywall so they can't actually read more than
> the title and abstract at best. A "sprinkling" campaign is likely to see
> citations based only on title and abstract ("well, it sounds like this
> paper which includes a woman author is talking about this topic") but the
> paper may not support the specific claim made in the text (indeed, it might
> say the exact opposite). A sprinkling campaign should only target the
> Further Reading section whose role is:
>
> "The Further reading section of an article contains a bulleted list of a
> reasonable number of works which a reader may consult for additional and
> more detailed coverage of the subject of the article. In articles with
> numerous footnotes, it probably is not obvious which ones are suitable for
> further reading. The "Further reading" section can help the readers by
> listing selected titles without worrying about duplications."
>
> which would avoid the risk of adding a citation that doesn't support the
> specific claims being made in the article. So maybe it would be possible to
> add a "skewed gender balance" tag onto the Further reading section and/or
> External links section whose role is
>
> "Some acceptable links include those that contain further research that is
> accurate and on-topic, information that could not be added to the article
> for reasons such as copyright or amount of detail, or other meaningful,
> relevant content that is not suitable for inclusion in an article for
> reasons unrelated to its accuracy."
>
> The downside is this idea for adding female authors to the Further
> Reading and External Links sections is whether anyone ever looks at them.
> Currently over 50% of Wikipedia hits are now via mobile device. The mobile
> render of a Wikipedia article is not the whole article as you see on
> desktop and laptop but rather you select the sections you want to read, so
> for mobile readers we do know precisely what sections they are opening from
> which we have learned that people in developed countries are not generally
> reading whole articles but specific sections (suggesting seeking answers to
> a specific need rather than a desire to fully appreciate the topic), and
> they don't tend to open anything after the References as a rule, so they
> aren't looking at Further Reading and External links anyway. Are
> desktop/laptop readers looking at them either? We don't really know as they
> get the whole article rendered as a single result and it would really only
> be eye-tracking studies (an expensive type of experiment) that would give
> us this insight with the same accuracy as our mobile data.
>
> Aside, in less developed countries, readers are more likely to read whole
> articles on a mobile device. While the reasons for this different are not
> proven, I'd be prepared to guess at two interlinked hypotheses. Firstly,
> such countries have poorer standards of education so people may be using
> Wikipedia to supplement their limited formal education. Also such countries
> are more likely to be using rote learning in their education system
> (valuing the ability to memorise and reproduce) rather than the more
> problem-solving learning approaches increasingly in use in the education
> systems of more developed countries. That would also explain
> whole-of-article viewing rather than selecting specific sub-sections.
>
> In some ways, I think a better solution might be to try to get Google
> scholar interested in the issue of gender. What if articles listed on
> Google scholar came with a little gender balance score (a bit like hotel
> ratings). One blue star (or some other symbol) for one male author, two
> blue stars (two male authors), one pink and one blue star (first author
> female, second author male), etc. Why I like the idea is that it is a
> simple-to-understand visual aid to draw attention to gender imbalance more
> widely but without a specific call to action (which as I outline above may
> backfire if citations get added for gender balance rather than content). It
> potentially helps address the real world problem which would hopefully flow
> through to Wikipedia. Also Google Scholar is probably a lot better
> resourced to build the tools to do the legwork of determining gender (I
> guess a white star is used when it can't). The risks though that Leia has
> previously mentioned is that automated tools don't do a great job of
> getting gender correct particularly as the tools are often trained on
> limited data sets such as mostly white people making the automated gender
> guessing of non-white people more likely to be incorrect. However, as
> authors can establish their own Google Scholar profile (if the author's
> name is underlined, it's a link to their profile, that's a place where they
> could disclose their gender if they desired or correct Google Scholar's
> mistaken guess or demand that Google Scholar not show their gender
> (whatever should be their choice). Hmm, might it lead to catfishing?
> Authors passing themselves off as a different gender? Hmm ...
>
> Another place we might explore is marking gender in some easily visible
> way is in WikiCite but frankly I know little about that project so cannot
> comment on it nor the merits of doing it there rather than on Google
> scholar. I don't think traditional journal publishers are likely to be keen
> to show gender balance on their own websites as I think they would realise
> it would enable webscraping to reveal their overall gender balance profile,
> leading to some adverse headlines about "Brandname journals worst for
> gender equity". But Google Scholar has less to fear unless it was
> demonstrated that they exhibited stronger gender bias than the journals
> themselves but I would think that Google Scholar aggregates papers without
> any regard to the gender of the authors, but I guess it might not aggregate
> all topic areas equally. For example, if they didn't make much effort to
> include (say) nursing publications (a more female academic discipline) but
> went hard on engineering publications (a more male academic discipline), I
> guess it would skew their author gender balance towards men.
>
> Kerry
>
> -----Original Message-----
> From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org]
> On Behalf Of Greg
> Sent: Thursday, 29 August 2019 4:06 AM
> To: Research into Wikimedia content and communities <
> wiki-research-l(a)lists.wikimedia.org>; jane023(a)gmail.com
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
>
> Hi Jane,
>
> Thanks for the link. It's clear that there is a lot of work being done,
> and even more left to do.
>
> I've been thinking about what you said about second authors and was
> wondering if instead of fixing it (or in addition to fixing it), it would
> make sense to put some sort of tag on the page itself (like the ones I see
> questioning notability or requests for additional citations). Something
> along the lines of authors missing from a particular citation and how to
> fix that, or no work by women cited in this article (if this is the case).
> It strikes me that by fixing it yourself, you are doing great work, but
> that maybe it also makes sense to spread awareness about these issues to
> the broader editing community so more people are thinking about it/doing
> it. At any rate, I thought I'd float the idea. Such a tag/the response (if
> any), could also be interesting to study, though perhaps something like
> this already exists and I'm just not aware of it, or perhaps there is good
> reason not to do it.
>
> All best,
> Greg
>
> On Tue, Aug 27, 2019 at 5:00 AM <
> wiki-research-l-request(a)lists.wikimedia.org>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> > wiki-research-l(a)lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> > wiki-research-l-request(a)lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > wiki-research-l-owner(a)lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: gender balance of wikipedia citations (Greg)
> > 2. Re: gender balance of Wikipedia citations (Jane Darnell)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 26 Aug 2019 18:56:12 -0700
> > From: Greg <thenatureprogram(a)gmail.com>
> > To: Isaac Johnson <isaac(a)wikimedia.org>
> > Cc: Research into Wikimedia content and communities
> > <wiki-research-l(a)lists.wikimedia.org>
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> > <
> > CAOO9DNv92bVR2COT2XmpHDU5kJOvD0yD3bahG+6Fkuma+HYDEg(a)mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Thanks, Isaac and Federico. These notes and links are very
> > helpful--and will require some time to process. As for how many years
> > I have to work on this, I'm retired! In truth, I keep hoping that
> > someone on this list will express interest in working on these
> > matters. The questions are all very interesting and quite relevant.
> > The idea of studying removed citations is both complex and compelling.
> >
> > Greg
> >
> > On Mon, Aug 26, 2019 at 6:49 AM Isaac Johnson <isaac(a)wikimedia.org>
> wrote:
> >
> > > Regarding data, I have not been a part of these projects but I think
> > > that I can help a bit with working links:
> > > * The (I believe) original dataset can also be found here:
> > >
> > https://analytics.wikimedia.org/datasets/archive/public-datasets/all/m
> > wrefs/
> > > * A newer version of this dataset was produced that also included
> > > information about whether the source was openly available and its
> topic:
> > > ** Meta page:
> > >
> > https://meta.wikimedia.org/wiki/Research:Towards_Modeling_Citation_Qua
> > lity
> > > ** Figshare:
> > >
> > https://figshare.com/articles/Accessibility_and_topics_of_citations_wi
> > th_identifiers_in_Wikipedia/6819710
> > >
> > > On Mon, Aug 26, 2019 at 3:53 AM Federico Leva (Nemo)
> > > <nemowiki(a)gmail.com
> > >
> > > wrote:
> > >
> > >> Greg, 22/08/19 06:19:
> > >> > I do not know the current status of wikicite or if/when this
> > >> > could be used for this inquiry--either to examine all, or a
> > >> > sensible
> > >> subset
> > >> > of the citations.
> > >>
> > >> If I see correctly, you still did not receive an answer on the data
> > >> available.
> > >>
> > >> It's true that the Figshare item for <
> > >>
> > https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_i
> > n_Wikipedia
> > >
> > >>
> > >> was deleted (I've asked about it on the talk page), but it's
> > >> trivial to run https://pypi.org/project/mwcites/ and extract the
> > >> data yourself, at least for citations which use an identifier.
> > >>
> > >> Some example datasets produced this way:
> > >> https://zenodo.org/record/15871
> > >> https://zenodo.org/record/55004
> > >> https://zenodo.org/record/54799
> > >>
> > >> Once you extract the list of works, the fun begins. You'll need to
> > >> intersect with other data sources (Wikidata, ORCID, other?) and
> > >> account for a number of factors until you manage to find a subset
> > >> of the data which has a sufficiently high signal:noise ratio. For
> > >> instance you might need to filter or normalise by
> > >> * year of publication (some year recent enough to have good data
> > >> but old enough to allow the work to be cited elsewhere, be archived
> > >> after embargos);
> > >> * country or institution (some probably have better ORCID
> > >> coverage);
> > >> * field/discipline and language;
> > >> * open access status (per Unpaywall);
> > >> * number of expected pageviews and clicks (for instance using
> > >> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and <
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Release
> > s>;
> > >>
> > >> a link from 10k articles on asteroids or proteins is not the same
> > >> as being the lone link from a popular article which is not the same
> > >> as a link buried among a thousand others on a big article);
> > >> * time or duration of the addition (with one of the various diff
> > >> extraction libraries, content persistence data or possibly
> > >> historical eventstream if such a thing is available).
> > >>
> > >> To avoid having to invent everything yourself, maybe you can reuse
> > >> the method of some similar study, for instance the one on the open
> > >> access citation advantage or one of the many which studied the
> > >> gender imbalance of citations and peer review in journals.
> > >>
> > >> However, it's very possible that the noise is just too much for a
> > >> general computational method. You might consider a more manual
> > >> approach on a sample of relevant events, for instance the *removal*
> > >> of citations, which is in my opinion more significant than the
> > >> addition.* You might extract all the diffs which removed a citation
> > >> from an article in the last N years (probably they'll be in the
> > >> order of 10^5 rather than 10^6), remove some massive events or
> > >> outliers, sample 500-1000 of them randomly and verify the required
> data manually.
> > >>
> > >> As usual it will be impossible to have an objective assessment of
> > >> whether that citation was really (in)appropriate in that context
> > >> according to the (English or whatever) Wikipedia guidelines. To
> > >> test that too, you should replicate one of the various studies of
> > >> the gender imbalance of peer review, perhaps one of those which
> > >> tried to assess the impact of a double blind peer review system on
> the gender imbalance.
> > >> However, because the sources are already published, you'd need to
> > >> provide the agendered information yourself and make sure the
> > >> participants perform their assessment in some controlled
> > >> environment where they don't have access to any gendered
> > >> information (i.e. where you cut them off the internet).
> > >>
> > >> How many years do you have to work on this project? :-)
> > >>
> > >> Federico
> > >>
> > >> (*) I might add a citation just because it's the first result a
> > >> popular search engine gives me, after glancing at the abstract and
> > >> maybe the journal home page; but if I remove an existing citation,
> > >> hopefully I've at least assessed its content and made a judgement
> > >> about it, apart from cases of mass removals for specific problems
> > >> with certain articles or publication venues.
> > >>
> > >> _______________________________________________
> > >> Wiki-research-l mailing list
> > >> Wiki-research-l(a)lists.wikimedia.org
> > >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >>
> > >
> > >
> > > --
> > > Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> > >
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Tue, 27 Aug 2019 08:00:45 +0200
> > From: Jane Darnell <jane023(a)gmail.com>
> > To: Research into Wikimedia content and communities
> > <wiki-research-l(a)lists.wikimedia.org>
> > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > Message-ID:
> > <CAFVcA-HqVicR0k65J4iox0PD=
> > oc3HBPMZLfXVO5zqkFD+EnSxQ(a)mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Greg,
> > Yes that's what I meant. On Wikipedia you get what you measure, so
> > many Wikipedians are page-creators and page-hit junkies because we can
> > measure that. The trick to motivating editors is giving them other
> > measurements for progress. Here is the link to the Women writers
> > Wikiproject and as you scroll down you can see what is measured.
> > https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_writers
> > Jane
> >
> > On Tue, Aug 27, 2019 at 3:39 AM Greg <thenatureprogram(a)gmail.com> wrote:
> >
> > > Thanks for sharing your experience and thoughts, Jane. I did not
> > > know
> > this
> > > was happening--I'm hardly an expert, so that's not surprising, and
> > > yet
> > it's
> > > still very troubling to hear. I'm not sure what you mean by setting
> > > up a Wikiproject. Do you mean of ways for how to study this
> > > gap--i.e., the
> > ideas
> > > that have been floated in this thread to this point? Or are you
> > > thinking
> > of
> > > something else?
> > >
> > > Greg
> > >
> > > On Mon, Aug 26, 2019 at 5:00 AM <
> > > wiki-research-l-request(a)lists.wikimedia.org>
> > > wrote:
> > >
> > > > Send Wiki-research-l mailing list submissions to
> > > > wiki-research-l(a)lists.wikimedia.org
> > > >
> > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > >
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > or, via email, send a message with subject or body 'help' to
> > > > wiki-research-l-request(a)lists.wikimedia.org
> > > >
> > > > You can reach the person managing the list at
> > > > wiki-research-l-owner(a)lists.wikimedia.org
> > > >
> > > > When replying, please edit your Subject line so it is more
> > > > specific than "Re: Contents of Wiki-research-l digest..."
> > > >
> > > >
> > > > Today's Topics:
> > > >
> > > > 1. Re: gender balance of Wikipedia citations (WereSpielChequers)
> > > > 2. Re: gender balance of Wikipedia citations (Greg)
> > > > 3. Re: sockpuppets and how to find them sooner (Federico Leva
> > (Nemo))
> > > > 4. Re: gender balance of Wikipedia citations (Jane Darnell)
> > > > 5. Re: gender balance of wikipedia citations (Federico Leva
> > > > (Nemo))
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > ----
> > > >
> > > > Message: 1
> > > > Date: Sun, 25 Aug 2019 14:28:25 +0100
> > > > From: WereSpielChequers <werespielchequers(a)gmail.com>
> > > > To: Research into Wikimedia content and communities
> > > > <wiki-research-l(a)lists.wikimedia.org>
> > > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia
> > > > citations
> > > > Message-ID:
> > > > <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8=
> > > > ShNONhSAA(a)mail.gmail.com>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Hi Greg,
> > > >
> > > > One of the major step changes in the early growth of the English
> > > Wikipedia
> > > > was when a bot called RamBot created stub articles on US places. I
> > think
> > > > they were cited to the census. Others have created articles on
> > > > rivers
> > in
> > > > countries and various other topics by similar programmatic means.
> > > Nowadays
> > > > such article creation is unlikely to get consensus on the English
> > > > Wikipedia, but there are some languages which are very open to
> > > > such creations and have them by the million.
> > > >
> > > > I'm not sure if the fastest updating of existing articles is
> > > > automated
> > or
> > > > just semiautomated. But looking at the bot requests page, it
> > > > certainly looks like some people are running such maintenance bots
> > > > "updating GDP
> > by
> > > > country" is a current bot request.
> > > > https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
> > > >
> > > > I'm not sure how "the ease of a source for purposes of converting
> > > > into
> > a
> > > > table and generating a separate article for each row" relates to
> > gender.
> > > > But i suspect "number of times cited in wikipedia" deserves less
> > > > kudos
> > > than
> > > > "number of times cited in academia".
> > > >
> > > > WSC
> > > >
> > > > On Sun, 25 Aug 2019 at 05:22, Greg <thenatureprogram(a)gmail.com>
> wrote:
> > > >
> > > > > Thanks again, Kerry. I am hoping that someone with access to
> > > > > more
> > > > resources
> > > > > (knowledge, support, etc) than I have will look into this.
> > > > >
> > > > > A few more thoughts/questions:
> > > > >
> > > > > 1. The link to the citation dataset from the Medium article
> > > > > ("What
> > are
> > > > the
> > > > > ten most cited sources on Wikipedia? Let’s ask the data.") is
> broken.
> > > > > 2. As far as I can tell, every named author in the top ten most
> > > > > cited sources on Wikipedia is male. One piece is by a working
> > > > > group 3. This line from the Medium piece struck me: "Many of
> > > > > these
> > > publications
> > > > > have been cited by Wikipedians across large series of articles
> > > > > using powerful bots and automated tools."
> > > > >
> > > > > Are citations being added by bots? I'm not sure that I
> > > > > understand
> > that
> > > > line
> > > > > correctly.
> > > > >
> > > > > Greg
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Sun, 25 Aug 2019 21:16:25 -0700
> > > > From: Greg <thenatureprogram(a)gmail.com>
> > > > To: wiki-research-l(a)lists.wikimedia.org
> > > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia
> > > > citations
> > > > Message-ID:
> > > > <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa=
> > > > vCPkzFbEysfGQqnVg(a)mail.gmail.com>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Thanks, WSC. All very interesting.
> > > >
> > > > I've been thinking about Wiklpedia citations less in terms of
> > > > kudos and more in terms of a feedback loop. The cited sources get
> > > > a significant amount of attention (1 click per 200 pageviews is
> > > > the number I saw recently). When I imagine total Wikipedia
> > > > traffic, that's huge. How
> > many
> > > > students are finding sources this way? How many academics? And how
> > > > many
> > > of
> > > > these citations are finding their way back into academic
> > > > publications
> > via
> > > > this mechanism?
> > > >
> > > > Assuming this is happening to some degree, the gender imbalance of
> > > > the citations is also reflected. If the Wikipedia imbalance is the
> > > > same as
> > > the
> > > > one in academia, that's one thing; if it is better on Wikipedia
> > > > than it
> > > is
> > > > in academia, that's reason to celebrate; if the balance is worse,
> > that's
> > > > concerning. In fact, if the gender imbalance conforms to my fears
> > instead
> > > > of my hopes, and is magnified by the massive website traffic, I
> > > > imagine
> > > it
> > > > could even explain the growth in the citation disparity
> > > > researchers
> > note
> > > in
> > > > their study of political science texts. (I link to that study in a
> > > previous
> > > > post; it was mentioned in the Washington Post recently)
> > > >
> > > > There is a very real possibility that Wikipedia is making the
> > > > citation gender gap worse. I think we need to understand what is
> > > > happening and
> > > take
> > > > immediate action if the news is not good.
> > > >
> > > > Greg
> > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 3
> > > > Date: Mon, 26 Aug 2019 10:59:07 +0300
> > > > From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> > > > To: Research into Wikimedia content and communities
> > > > <wiki-research-l(a)lists.wikimedia.org>, Aaron Halfaker
> > > > <ahalfaker(a)wikimedia.org>, Kerry Raymond <
> > > kerry.raymond(a)gmail.com>
> > > > Subject: Re: [Wiki-research-l] sockpuppets and how to find them
> > > > sooner
> > > > Message-ID: <cf2734ff-d2cf-3108-691f-8ecf46125ed7(a)gmail.com>
> > > > Content-Type: text/plain; charset=utf-8; format=flowed
> > > >
> > > > Please everyone avoid using jargon specific to the English
> > > > Wikipedia on this cross-language and cross-wiki mailing list.
> > > >
> > > > Aaron Halfaker, 23/08/19 17:36:
> > > > > I think embeddings[1] would be a nice way to create a signature.
> > > >
> > > > There is some discussion of acceptable user fingerprinting
> > > > (presumably to be available to CheckUsers only), other than the
> > > > usual over-reliance on IP addresses, in particular at <
> > > >
> > >
> >
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Ab…
> > > > >.
> > > >
> > > > Federico
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 4
> > > > Date: Mon, 26 Aug 2019 10:17:46 +0200
> > > > From: Jane Darnell <jane023(a)gmail.com>
> > > > To: Research into Wikimedia content and communities
> > > > <wiki-research-l(a)lists.wikimedia.org>
> > > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > > > Message-ID:
> > > > <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH=
> > > > f4M40faVNbKkqw(a)mail.gmail.com>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Greg,
> > > > Thanks for worrying. This is a known problem and yes, Wikipedia
> > > contributes
> > > > to the Gendergap in citations and no, it's not an easy fix, since it
> is
> > > the
> > > > fault of systemic bias in academia. So fewer women are head author on
> > > > scientific publications, and it is generally only the head author
> that
> > > gets
> > > > cited on Wikipedia. This is not just a problem with written works in
> > the
> > > > field of politics. I spend most of my time working on paintings and
> > > their
> > > > documented catalogs, so generally I only notice and fix this problem
> in
> > > art
> > > > catalogs. Women rarely appear as lead author mentioned. I will always
> > add
> > > > them in to descriptions when I add items for their works on Wikidata,
> > > but I
> > > > can not always find them! Sometimes I can't even create items for
> them
> > > > because all I have is a name and a work and nothing else available
> > online
> > > > anywhere. You see this most often with women who spent entire careers
> > > > working at a single institution and the institution doesn't bother to
> > > > promote their work or even list them in exhibition catalogs. With
> luck
> > > > there might be a local obituary, but not always. If you have
> > suggestions
> > > > how to set up a Wikiproject to tackle this it would be a good idea.
> In
> > my
> > > > onwiki experience the Women-in-Red community can be very positive in
> > > their
> > > > response to gendergap-related issues for women writers.
> > > > Jane
> > > >
> > > > On Mon, Aug 26, 2019 at 6:17 AM Greg <thenatureprogram(a)gmail.com>
> > wrote:
> > > >
> > > > > Thanks, WSC. All very interesting.
> > > > >
> > > > > I've been thinking about Wiklpedia citations less in terms of kudos
> > and
> > > > > more in terms of a feedback loop. The cited sources get a
> significant
> > > > > amount of attention (1 click per 200 pageviews is the number I saw
> > > > > recently). When I imagine total Wikipedia traffic, that's huge. How
> > > many
> > > > > students are finding sources this way? How many academics? And how
> > many
> > > > of
> > > > > these citations are finding their way back into academic
> publications
> > > via
> > > > > this mechanism?
> > > > >
> > > > > Assuming this is happening to some degree, the gender imbalance of
> > the
> > > > > citations is also reflected. If the Wikipedia imbalance is the same
> > as
> > > > the
> > > > > one in academia, that's one thing; if it is better on Wikipedia
> than
> > it
> > > > is
> > > > > in academia, that's reason to celebrate; if the balance is worse,
> > > that's
> > > > > concerning. In fact, if the gender imbalance conforms to my fears
> > > instead
> > > > > of my hopes, and is magnified by the massive website traffic, I
> > imagine
> > > > it
> > > > > could even explain the growth in the citation disparity researchers
> > > note
> > > > in
> > > > > their study of political science texts. (I link to that study in a
> > > > previous
> > > > > post; it was mentioned in the Washington Post recently)
> > > > >
> > > > > There is a very real possibility that Wikipedia is making the
> > citation
> > > > > gender gap worse. I think we need to understand what is happening
> and
> > > > take
> > > > > immediate action if the news is not good.
> > > > >
> > > > > Greg
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > Wiki-research-l(a)lists.wikimedia.org
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 5
> > > > Date: Mon, 26 Aug 2019 11:45:09 +0300
> > > > From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> > > > To: Research into Wikimedia content and communities
> > > > <wiki-research-l(a)lists.wikimedia.org>, Greg
> > > > <thenatureprogram(a)gmail.com>
> > > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID: <835202af-4653-641e-782e-c619458bdd7f(a)gmail.com>
> > > > Content-Type: text/plain; charset=utf-8; format=flowed
> > > >
> > > > Greg, 22/08/19 06:19:
> > > > > I do not know the current status of wikicite or if/when this
> > > > > could be used for this inquiry--either to examine all, or a
> sensible
> > > > subset
> > > > > of the citations.
> > > >
> > > > If I see correctly, you still did not receive an answer on the data
> > > > available.
> > > >
> > > > It's true that the Figshare item for
> > > > <
> > > >
> > >
> >
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wik…
> > > >
> > > >
> > > > was deleted (I've asked about it on the talk page), but it's trivial
> to
> > > > run https://pypi.org/project/mwcites/ and extract the data yourself,
> > at
> > > > least for citations which use an identifier.
> > > >
> > > > Some example datasets produced this way:
> > > > https://zenodo.org/record/15871
> > > > https://zenodo.org/record/55004
> > > > https://zenodo.org/record/54799
> > > >
> > > > Once you extract the list of works, the fun begins. You'll need to
> > > > intersect with other data sources (Wikidata, ORCID, other?) and
> account
> > > > for a number of factors until you manage to find a subset of the data
> > > > which has a sufficiently high signal:noise ratio. For instance you
> > might
> > > > need to filter or normalise by
> > > > * year of publication (some year recent enough to have good data but
> > old
> > > > enough to allow the work to be cited elsewhere, be archived after
> > > > embargos);
> > > > * country or institution (some probably have better ORCID coverage);
> > > > * field/discipline and language;
> > > > * open access status (per Unpaywall);
> > > > * number of expected pageviews and clicks (for instance using
> > > > <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> > > > <
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases
> > > >;
> > > >
> > > > a link from 10k articles on asteroids or proteins is not the same as
> > > > being the lone link from a popular article which is not the same as a
> > > > link buried among a thousand others on a big article);
> > > > * time or duration of the addition (with one of the various diff
> > > > extraction libraries, content persistence data or possibly historical
> > > > eventstream if such a thing is available).
> > > >
> > > > To avoid having to invent everything yourself, maybe you can reuse
> the
> > > > method of some similar study, for instance the one on the open access
> > > > citation advantage or one of the many which studied the gender
> > imbalance
> > > > of citations and peer review in journals.
> > > >
> > > > However, it's very possible that the noise is just too much for a
> > > > general computational method. You might consider a more manual
> approach
> > > > on a sample of relevant events, for instance the *removal* of
> > citations,
> > > > which is in my opinion more significant than the addition.* You might
> > > > extract all the diffs which removed a citation from an article in the
> > > > last N years (probably they'll be in the order of 10^5 rather than
> > > > 10^6), remove some massive events or outliers, sample 500-1000 of
> them
> > > > randomly and verify the required data manually.
> > > >
> > > > As usual it will be impossible to have an objective assessment of
> > > > whether that citation was really (in)appropriate in that context
> > > > according to the (English or whatever) Wikipedia guidelines. To test
> > > > that too, you should replicate one of the various studies of the
> gender
> > > > imbalance of peer review, perhaps one of those which tried to assess
> > the
> > > > impact of a double blind peer review system on the gender imbalance.
> > > > However, because the sources are already published, you'd need to
> > > > provide the agendered information yourself and make sure the
> > > > participants perform their assessment in some controlled environment
> > > > where they don't have access to any gendered information (i.e. where
> > you
> > > > cut them off the internet).
> > > >
> > > > How many years do you have to work on this project? :-)
> > > >
> > > > Federico
> > > >
> > > > (*) I might add a citation just because it's the first result a
> popular
> > > > search engine gives me, after glancing at the abstract and maybe the
> > > > journal home page; but if I remove an existing citation, hopefully
> I've
> > > > at least assessed its content and made a judgement about it, apart
> from
> > > > cases of mass removals for specific problems with certain articles or
> > > > publication venues.
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Subject: Digest Footer
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l(a)lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > End of Wiki-research-l Digest, Vol 168, Issue 20
> > > > ************************************************
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> > ------------------------------
> >
> > End of Wiki-research-l Digest, Vol 168, Issue 22
> > ************************************************
> >
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 25
> ************************************************
>
Dear all,
I thank you for your efforts. I invite to see my Wikimania report at https://meta.m.wikimedia.org/wiki/Wikimedia_France/Micro-financement/Wikima…. Waiting for the video of my session entitled Wikidata and Health: Current situation and perspectives.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
____________________
+21629499418
Dear all,
I thank you for your efforts. I am managing to begin writing a survey about wiki sites and publish it in an appropriate journal... This survey will involve the software used (Mediawiki or other), the size, reference support, topic... I am looking for contributors to this project. Anyone who has published two papers about wikis in a research journal is invited to join the initiative.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
_____________________
+21629499418
Hi Jane,
Thanks for the link. It's clear that there is a lot of work being done, and
even more left to do.
I've been thinking about what you said about second authors and was
wondering if instead of fixing it (or in addition to fixing it), it would
make sense to put some sort of tag on the page itself (like the ones I see
questioning notability or requests for additional citations). Something
along the lines of authors missing from a particular citation and how to
fix that, or no work by women cited in this article (if this is the case).
It strikes me that by fixing it yourself, you are doing great work, but
that maybe it also makes sense to spread awareness about these issues to
the broader editing community so more people are thinking about it/doing
it. At any rate, I thought I'd float the idea. Such a tag/the response (if
any), could also be interesting to study, though perhaps something like
this already exists and I'm just not aware of it, or perhaps there is good
reason not to do it.
All best,
Greg
On Tue, Aug 27, 2019 at 5:00 AM <wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Re: gender balance of wikipedia citations (Greg)
> 2. Re: gender balance of Wikipedia citations (Jane Darnell)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 26 Aug 2019 18:56:12 -0700
> From: Greg <thenatureprogram(a)gmail.com>
> To: Isaac Johnson <isaac(a)wikimedia.org>
> Cc: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
> <
> CAOO9DNv92bVR2COT2XmpHDU5kJOvD0yD3bahG+6Fkuma+HYDEg(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Thanks, Isaac and Federico. These notes and links are very helpful--and
> will require some time to process. As for how many years I have to work on
> this, I'm retired! In truth, I keep hoping that someone on this list will
> express interest in working on these matters. The questions are all very
> interesting and quite relevant. The idea of studying removed citations is
> both complex and compelling.
>
> Greg
>
> On Mon, Aug 26, 2019 at 6:49 AM Isaac Johnson <isaac(a)wikimedia.org> wrote:
>
> > Regarding data, I have not been a part of these projects but I think that
> > I can help a bit with working links:
> > * The (I believe) original dataset can also be found here:
> >
> https://analytics.wikimedia.org/datasets/archive/public-datasets/all/mwrefs/
> > * A newer version of this dataset was produced that also included
> > information about whether the source was openly available and its topic:
> > ** Meta page:
> >
> https://meta.wikimedia.org/wiki/Research:Towards_Modeling_Citation_Quality
> > ** Figshare:
> >
> https://figshare.com/articles/Accessibility_and_topics_of_citations_with_id…
> >
> > On Mon, Aug 26, 2019 at 3:53 AM Federico Leva (Nemo) <nemowiki(a)gmail.com
> >
> > wrote:
> >
> >> Greg, 22/08/19 06:19:
> >> > I do not know the current status of wikicite or if/when this
> >> > could be used for this inquiry--either to examine all, or a sensible
> >> subset
> >> > of the citations.
> >>
> >> If I see correctly, you still did not receive an answer on the data
> >> available.
> >>
> >> It's true that the Figshare item for
> >> <
> >>
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wik…
> >
> >>
> >> was deleted (I've asked about it on the talk page), but it's trivial to
> >> run https://pypi.org/project/mwcites/ and extract the data yourself, at
> >> least for citations which use an identifier.
> >>
> >> Some example datasets produced this way:
> >> https://zenodo.org/record/15871
> >> https://zenodo.org/record/55004
> >> https://zenodo.org/record/54799
> >>
> >> Once you extract the list of works, the fun begins. You'll need to
> >> intersect with other data sources (Wikidata, ORCID, other?) and account
> >> for a number of factors until you manage to find a subset of the data
> >> which has a sufficiently high signal:noise ratio. For instance you might
> >> need to filter or normalise by
> >> * year of publication (some year recent enough to have good data but old
> >> enough to allow the work to be cited elsewhere, be archived after
> >> embargos);
> >> * country or institution (some probably have better ORCID coverage);
> >> * field/discipline and language;
> >> * open access status (per Unpaywall);
> >> * number of expected pageviews and clicks (for instance using
> >> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> >> <
> https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases>;
> >>
> >> a link from 10k articles on asteroids or proteins is not the same as
> >> being the lone link from a popular article which is not the same as a
> >> link buried among a thousand others on a big article);
> >> * time or duration of the addition (with one of the various diff
> >> extraction libraries, content persistence data or possibly historical
> >> eventstream if such a thing is available).
> >>
> >> To avoid having to invent everything yourself, maybe you can reuse the
> >> method of some similar study, for instance the one on the open access
> >> citation advantage or one of the many which studied the gender imbalance
> >> of citations and peer review in journals.
> >>
> >> However, it's very possible that the noise is just too much for a
> >> general computational method. You might consider a more manual approach
> >> on a sample of relevant events, for instance the *removal* of citations,
> >> which is in my opinion more significant than the addition.* You might
> >> extract all the diffs which removed a citation from an article in the
> >> last N years (probably they'll be in the order of 10^5 rather than
> >> 10^6), remove some massive events or outliers, sample 500-1000 of them
> >> randomly and verify the required data manually.
> >>
> >> As usual it will be impossible to have an objective assessment of
> >> whether that citation was really (in)appropriate in that context
> >> according to the (English or whatever) Wikipedia guidelines. To test
> >> that too, you should replicate one of the various studies of the gender
> >> imbalance of peer review, perhaps one of those which tried to assess the
> >> impact of a double blind peer review system on the gender imbalance.
> >> However, because the sources are already published, you'd need to
> >> provide the agendered information yourself and make sure the
> >> participants perform their assessment in some controlled environment
> >> where they don't have access to any gendered information (i.e. where you
> >> cut them off the internet).
> >>
> >> How many years do you have to work on this project? :-)
> >>
> >> Federico
> >>
> >> (*) I might add a citation just because it's the first result a popular
> >> search engine gives me, after glancing at the abstract and maybe the
> >> journal home page; but if I remove an existing citation, hopefully I've
> >> at least assessed its content and made a judgement about it, apart from
> >> cases of mass removals for specific problems with certain articles or
> >> publication venues.
> >>
> >> _______________________________________________
> >> Wiki-research-l mailing list
> >> Wiki-research-l(a)lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>
> >
> >
> > --
> > Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 27 Aug 2019 08:00:45 +0200
> From: Jane Darnell <jane023(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
> <CAFVcA-HqVicR0k65J4iox0PD=
> oc3HBPMZLfXVO5zqkFD+EnSxQ(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Greg,
> Yes that's what I meant. On Wikipedia you get what you measure, so many
> Wikipedians are page-creators and page-hit junkies because we can measure
> that. The trick to motivating editors is giving them other measurements for
> progress. Here is the link to the Women writers Wikiproject and as you
> scroll down you can see what is measured.
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_writers
> Jane
>
> On Tue, Aug 27, 2019 at 3:39 AM Greg <thenatureprogram(a)gmail.com> wrote:
>
> > Thanks for sharing your experience and thoughts, Jane. I did not know
> this
> > was happening--I'm hardly an expert, so that's not surprising, and yet
> it's
> > still very troubling to hear. I'm not sure what you mean by setting up a
> > Wikiproject. Do you mean of ways for how to study this gap--i.e., the
> ideas
> > that have been floated in this thread to this point? Or are you thinking
> of
> > something else?
> >
> > Greg
> >
> > On Mon, Aug 26, 2019 at 5:00 AM <
> > wiki-research-l-request(a)lists.wikimedia.org>
> > wrote:
> >
> > > Send Wiki-research-l mailing list submissions to
> > > wiki-research-l(a)lists.wikimedia.org
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > or, via email, send a message with subject or body 'help' to
> > > wiki-research-l-request(a)lists.wikimedia.org
> > >
> > > You can reach the person managing the list at
> > > wiki-research-l-owner(a)lists.wikimedia.org
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Wiki-research-l digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > > 1. Re: gender balance of Wikipedia citations (WereSpielChequers)
> > > 2. Re: gender balance of Wikipedia citations (Greg)
> > > 3. Re: sockpuppets and how to find them sooner (Federico Leva
> (Nemo))
> > > 4. Re: gender balance of Wikipedia citations (Jane Darnell)
> > > 5. Re: gender balance of wikipedia citations (Federico Leva (Nemo))
> > >
> > >
> > > ----------------------------------------------------------------------
> > >
> > > Message: 1
> > > Date: Sun, 25 Aug 2019 14:28:25 +0100
> > > From: WereSpielChequers <werespielchequers(a)gmail.com>
> > > To: Research into Wikimedia content and communities
> > > <wiki-research-l(a)lists.wikimedia.org>
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > > Message-ID:
> > > <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8=
> > > ShNONhSAA(a)mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi Greg,
> > >
> > > One of the major step changes in the early growth of the English
> > Wikipedia
> > > was when a bot called RamBot created stub articles on US places. I
> think
> > > they were cited to the census. Others have created articles on rivers
> in
> > > countries and various other topics by similar programmatic means.
> > Nowadays
> > > such article creation is unlikely to get consensus on the English
> > > Wikipedia, but there are some languages which are very open to such
> > > creations and have them by the million.
> > >
> > > I'm not sure if the fastest updating of existing articles is automated
> or
> > > just semiautomated. But looking at the bot requests page, it certainly
> > > looks like some people are running such maintenance bots "updating GDP
> by
> > > country" is a current bot request.
> > > https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
> > >
> > > I'm not sure how "the ease of a source for purposes of converting into
> a
> > > table and generating a separate article for each row" relates to
> gender.
> > > But i suspect "number of times cited in wikipedia" deserves less kudos
> > than
> > > "number of times cited in academia".
> > >
> > > WSC
> > >
> > > On Sun, 25 Aug 2019 at 05:22, Greg <thenatureprogram(a)gmail.com> wrote:
> > >
> > > > Thanks again, Kerry. I am hoping that someone with access to more
> > > resources
> > > > (knowledge, support, etc) than I have will look into this.
> > > >
> > > > A few more thoughts/questions:
> > > >
> > > > 1. The link to the citation dataset from the Medium article ("What
> are
> > > the
> > > > ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
> > > > 2. As far as I can tell, every named author in the top ten most cited
> > > > sources on Wikipedia is male. One piece is by a working group
> > > > 3. This line from the Medium piece struck me: "Many of these
> > publications
> > > > have been cited by Wikipedians across large series of articles using
> > > > powerful bots and automated tools."
> > > >
> > > > Are citations being added by bots? I'm not sure that I understand
> that
> > > line
> > > > correctly.
> > > >
> > > > Greg
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 2
> > > Date: Sun, 25 Aug 2019 21:16:25 -0700
> > > From: Greg <thenatureprogram(a)gmail.com>
> > > To: wiki-research-l(a)lists.wikimedia.org
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > > Message-ID:
> > > <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa=
> > > vCPkzFbEysfGQqnVg(a)mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Thanks, WSC. All very interesting.
> > >
> > > I've been thinking about Wiklpedia citations less in terms of kudos and
> > > more in terms of a feedback loop. The cited sources get a significant
> > > amount of attention (1 click per 200 pageviews is the number I saw
> > > recently). When I imagine total Wikipedia traffic, that's huge. How
> many
> > > students are finding sources this way? How many academics? And how many
> > of
> > > these citations are finding their way back into academic publications
> via
> > > this mechanism?
> > >
> > > Assuming this is happening to some degree, the gender imbalance of the
> > > citations is also reflected. If the Wikipedia imbalance is the same as
> > the
> > > one in academia, that's one thing; if it is better on Wikipedia than it
> > is
> > > in academia, that's reason to celebrate; if the balance is worse,
> that's
> > > concerning. In fact, if the gender imbalance conforms to my fears
> instead
> > > of my hopes, and is magnified by the massive website traffic, I imagine
> > it
> > > could even explain the growth in the citation disparity researchers
> note
> > in
> > > their study of political science texts. (I link to that study in a
> > previous
> > > post; it was mentioned in the Washington Post recently)
> > >
> > > There is a very real possibility that Wikipedia is making the citation
> > > gender gap worse. I think we need to understand what is happening and
> > take
> > > immediate action if the news is not good.
> > >
> > > Greg
> > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 3
> > > Date: Mon, 26 Aug 2019 10:59:07 +0300
> > > From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> > > To: Research into Wikimedia content and communities
> > > <wiki-research-l(a)lists.wikimedia.org>, Aaron Halfaker
> > > <ahalfaker(a)wikimedia.org>, Kerry Raymond <
> > kerry.raymond(a)gmail.com>
> > > Subject: Re: [Wiki-research-l] sockpuppets and how to find them sooner
> > > Message-ID: <cf2734ff-d2cf-3108-691f-8ecf46125ed7(a)gmail.com>
> > > Content-Type: text/plain; charset=utf-8; format=flowed
> > >
> > > Please everyone avoid using jargon specific to the English Wikipedia on
> > > this cross-language and cross-wiki mailing list.
> > >
> > > Aaron Halfaker, 23/08/19 17:36:
> > > > I think embeddings[1] would be a nice way to create a signature.
> > >
> > > There is some discussion of acceptable user fingerprinting (presumably
> > > to be available to CheckUsers only), other than the usual over-reliance
> > > on IP addresses, in particular at
> > > <
> > >
> >
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Ab…
> > > >.
> > >
> > > Federico
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 4
> > > Date: Mon, 26 Aug 2019 10:17:46 +0200
> > > From: Jane Darnell <jane023(a)gmail.com>
> > > To: Research into Wikimedia content and communities
> > > <wiki-research-l(a)lists.wikimedia.org>
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > > Message-ID:
> > > <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH=
> > > f4M40faVNbKkqw(a)mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Greg,
> > > Thanks for worrying. This is a known problem and yes, Wikipedia
> > contributes
> > > to the Gendergap in citations and no, it's not an easy fix, since it is
> > the
> > > fault of systemic bias in academia. So fewer women are head author on
> > > scientific publications, and it is generally only the head author that
> > gets
> > > cited on Wikipedia. This is not just a problem with written works in
> the
> > > field of politics. I spend most of my time working on paintings and
> > their
> > > documented catalogs, so generally I only notice and fix this problem in
> > art
> > > catalogs. Women rarely appear as lead author mentioned. I will always
> add
> > > them in to descriptions when I add items for their works on Wikidata,
> > but I
> > > can not always find them! Sometimes I can't even create items for them
> > > because all I have is a name and a work and nothing else available
> online
> > > anywhere. You see this most often with women who spent entire careers
> > > working at a single institution and the institution doesn't bother to
> > > promote their work or even list them in exhibition catalogs. With luck
> > > there might be a local obituary, but not always. If you have
> suggestions
> > > how to set up a Wikiproject to tackle this it would be a good idea. In
> my
> > > onwiki experience the Women-in-Red community can be very positive in
> > their
> > > response to gendergap-related issues for women writers.
> > > Jane
> > >
> > > On Mon, Aug 26, 2019 at 6:17 AM Greg <thenatureprogram(a)gmail.com>
> wrote:
> > >
> > > > Thanks, WSC. All very interesting.
> > > >
> > > > I've been thinking about Wiklpedia citations less in terms of kudos
> and
> > > > more in terms of a feedback loop. The cited sources get a significant
> > > > amount of attention (1 click per 200 pageviews is the number I saw
> > > > recently). When I imagine total Wikipedia traffic, that's huge. How
> > many
> > > > students are finding sources this way? How many academics? And how
> many
> > > of
> > > > these citations are finding their way back into academic publications
> > via
> > > > this mechanism?
> > > >
> > > > Assuming this is happening to some degree, the gender imbalance of
> the
> > > > citations is also reflected. If the Wikipedia imbalance is the same
> as
> > > the
> > > > one in academia, that's one thing; if it is better on Wikipedia than
> it
> > > is
> > > > in academia, that's reason to celebrate; if the balance is worse,
> > that's
> > > > concerning. In fact, if the gender imbalance conforms to my fears
> > instead
> > > > of my hopes, and is magnified by the massive website traffic, I
> imagine
> > > it
> > > > could even explain the growth in the citation disparity researchers
> > note
> > > in
> > > > their study of political science texts. (I link to that study in a
> > > previous
> > > > post; it was mentioned in the Washington Post recently)
> > > >
> > > > There is a very real possibility that Wikipedia is making the
> citation
> > > > gender gap worse. I think we need to understand what is happening and
> > > take
> > > > immediate action if the news is not good.
> > > >
> > > > Greg
> > > >
> > > > >
> > > > >
> > > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l(a)lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 5
> > > Date: Mon, 26 Aug 2019 11:45:09 +0300
> > > From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> > > To: Research into Wikimedia content and communities
> > > <wiki-research-l(a)lists.wikimedia.org>, Greg
> > > <thenatureprogram(a)gmail.com>
> > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID: <835202af-4653-641e-782e-c619458bdd7f(a)gmail.com>
> > > Content-Type: text/plain; charset=utf-8; format=flowed
> > >
> > > Greg, 22/08/19 06:19:
> > > > I do not know the current status of wikicite or if/when this
> > > > could be used for this inquiry--either to examine all, or a sensible
> > > subset
> > > > of the citations.
> > >
> > > If I see correctly, you still did not receive an answer on the data
> > > available.
> > >
> > > It's true that the Figshare item for
> > > <
> > >
> >
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wik…
> > >
> > >
> > > was deleted (I've asked about it on the talk page), but it's trivial to
> > > run https://pypi.org/project/mwcites/ and extract the data yourself,
> at
> > > least for citations which use an identifier.
> > >
> > > Some example datasets produced this way:
> > > https://zenodo.org/record/15871
> > > https://zenodo.org/record/55004
> > > https://zenodo.org/record/54799
> > >
> > > Once you extract the list of works, the fun begins. You'll need to
> > > intersect with other data sources (Wikidata, ORCID, other?) and account
> > > for a number of factors until you manage to find a subset of the data
> > > which has a sufficiently high signal:noise ratio. For instance you
> might
> > > need to filter or normalise by
> > > * year of publication (some year recent enough to have good data but
> old
> > > enough to allow the work to be cited elsewhere, be archived after
> > > embargos);
> > > * country or institution (some probably have better ORCID coverage);
> > > * field/discipline and language;
> > > * open access status (per Unpaywall);
> > > * number of expected pageviews and clicks (for instance using
> > > <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> > > <
> https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases
> > >;
> > >
> > > a link from 10k articles on asteroids or proteins is not the same as
> > > being the lone link from a popular article which is not the same as a
> > > link buried among a thousand others on a big article);
> > > * time or duration of the addition (with one of the various diff
> > > extraction libraries, content persistence data or possibly historical
> > > eventstream if such a thing is available).
> > >
> > > To avoid having to invent everything yourself, maybe you can reuse the
> > > method of some similar study, for instance the one on the open access
> > > citation advantage or one of the many which studied the gender
> imbalance
> > > of citations and peer review in journals.
> > >
> > > However, it's very possible that the noise is just too much for a
> > > general computational method. You might consider a more manual approach
> > > on a sample of relevant events, for instance the *removal* of
> citations,
> > > which is in my opinion more significant than the addition.* You might
> > > extract all the diffs which removed a citation from an article in the
> > > last N years (probably they'll be in the order of 10^5 rather than
> > > 10^6), remove some massive events or outliers, sample 500-1000 of them
> > > randomly and verify the required data manually.
> > >
> > > As usual it will be impossible to have an objective assessment of
> > > whether that citation was really (in)appropriate in that context
> > > according to the (English or whatever) Wikipedia guidelines. To test
> > > that too, you should replicate one of the various studies of the gender
> > > imbalance of peer review, perhaps one of those which tried to assess
> the
> > > impact of a double blind peer review system on the gender imbalance.
> > > However, because the sources are already published, you'd need to
> > > provide the agendered information yourself and make sure the
> > > participants perform their assessment in some controlled environment
> > > where they don't have access to any gendered information (i.e. where
> you
> > > cut them off the internet).
> > >
> > > How many years do you have to work on this project? :-)
> > >
> > > Federico
> > >
> > > (*) I might add a citation just because it's the first result a popular
> > > search engine gives me, after glancing at the abstract and maybe the
> > > journal home page; but if I remove an existing citation, hopefully I've
> > > at least assessed its content and made a judgement about it, apart from
> > > cases of mass removals for specific problems with certain articles or
> > > publication venues.
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > > ------------------------------
> > >
> > > End of Wiki-research-l Digest, Vol 168, Issue 20
> > > ************************************************
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 22
> ************************************************
>
[You can skip this message if you've already read it in the Wikidata
mailing list, and sorry for the disturb]
Hi everyone,
Wearing the soweego project lead hat, I'm pleased to announce that the
Wikimedia Foundation has approved the *soweego 1.1* proposal:
https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Hjfocs/soweego_1.1
The main goal is to put together different machine learning algorithms
and get the highest-quality links between Wikidata and large external
catalogs.
Stay tuned for more rock'n'roll at:
https://github.com/Wikidata/soweego
And while you're there, why don't you give a star?
Cheers,
Marco
In a nutshell:
We are asking for your input to help us learn how to release the
historical edit data of Wikimedia projects in a more efficient way.
Please provide your feedback via
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3E…
by 2019-09-03.
******
Dear researchers,
The Analytics team at Wikimedia Foundation [1] has been working on
building a data lake [2] for Wikimedia edits [3] to enable the
research and analysis of Wikimedia's edit data in a more efficient
way. This data is a history of activity on Wikimedia projects as
complete and research-friendly as possible. Edits have context, such
as whether they were reverted, in the same line as the edit itself. So
you can focus more on what you want to find out instead of writing
code to wrestle the data. Each line of the data released will include
the following and more (see full specification [3a], [3b], [3c]):
* editor edit count, groups, blocks, bot status, name, current and
historical (time of edit)
* seconds since this editor's last edit
* page context, current and historical (namespace, seconds since last
revision, etc.)
* seconds to identity revert or deletion, if applicable
* revision tags (mobile edit, ve edit, etc.)
The first instance of this data will be released in the coming months
and to make this release as useful as possible for you all, the users
of the data, the team needs to hear your thoughts on how to slice and
dice the data at publishing time. You can provide your input at
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3E…
.
Please provide your input to this survey no later than 2019-09-03.
Best,
Leila
[1] https://wikitech.wikimedia.org/wiki/Analytics
[2] https://en.wikipedia.org/wiki/Data_lake
[3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
a) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_his…
b) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_use…
c) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_pag…
--
Leila Zia
Principal Research Scientist, Head of Research
Wikimedia Foundation
Hello everyone,
We are looking for a senior full-stack developer to work with us as a
contractor on Scribe.
Scribe is a Wikimedia Foundation funded software and research project. We
are building a tool to support newcomer Wikipedia editors in creating new
articles in under-resourced languages (e.g. Arabic, Hindi). The project is
open-source and will be integrated on low-resource Wikipedias so you will
be working on a great cause of making knowledge accessible to everyone.
Plus your work will have great visibility and impact on the community.
*You:*
We are looking for a senior software engineer contractor to realize the
Scribe project in a part-time position for 9 months starting on the 14th of
October.
You will work remotely with us. We are looking for someone with *3+ years
of experience in Full-stack development and community-focused applications*.
*Your future team:*
The project leads are Lucie and Hady, researchers in Computer Science. So
you will be part of our three-person team and able to influence the project
itself.
Lucie is a PhD researcher at the University of Southampton and TIB
Hannover. She previously worked at Wikimedia Germany. Hady holds a PhD in
Natural Language Processing and Machine Learning and currently works as a
researcher in Naver Labs Europe. We live in France and Germany/UK
respectively, so we are used to working remotely and effective
communication.
*How to apply: *
*Make sure to read the full job description here:*
https://meta.wikimedia.org/wiki/Scribe/Job
Then drop us an email (scribe.wikimedia(a)gmail.com) showing your interest in
the project with your *CV* and *portfolio attached*.
We also appreciate *any other kind of information in any format* that shows
your relevant experience, creativity and problem-solving skills.
Note: Interviews will be scheduled on a first come first served basis until
the position gets filled, so please consider to submit your profile as soon
as possible.
Looking forward to hearing from you,
Lucie-Aimée Kaffee and Hady Elsahar
--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
Thanks for sharing your experience and thoughts, Jane. I did not know this
was happening--I'm hardly an expert, so that's not surprising, and yet it's
still very troubling to hear. I'm not sure what you mean by setting up a
Wikiproject. Do you mean of ways for how to study this gap--i.e., the ideas
that have been floated in this thread to this point? Or are you thinking of
something else?
Greg
On Mon, Aug 26, 2019 at 5:00 AM <wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Re: gender balance of Wikipedia citations (WereSpielChequers)
> 2. Re: gender balance of Wikipedia citations (Greg)
> 3. Re: sockpuppets and how to find them sooner (Federico Leva (Nemo))
> 4. Re: gender balance of Wikipedia citations (Jane Darnell)
> 5. Re: gender balance of wikipedia citations (Federico Leva (Nemo))
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 25 Aug 2019 14:28:25 +0100
> From: WereSpielChequers <werespielchequers(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
> <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8=
> ShNONhSAA(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Greg,
>
> One of the major step changes in the early growth of the English Wikipedia
> was when a bot called RamBot created stub articles on US places. I think
> they were cited to the census. Others have created articles on rivers in
> countries and various other topics by similar programmatic means. Nowadays
> such article creation is unlikely to get consensus on the English
> Wikipedia, but there are some languages which are very open to such
> creations and have them by the million.
>
> I'm not sure if the fastest updating of existing articles is automated or
> just semiautomated. But looking at the bot requests page, it certainly
> looks like some people are running such maintenance bots "updating GDP by
> country" is a current bot request.
> https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
>
> I'm not sure how "the ease of a source for purposes of converting into a
> table and generating a separate article for each row" relates to gender.
> But i suspect "number of times cited in wikipedia" deserves less kudos than
> "number of times cited in academia".
>
> WSC
>
> On Sun, 25 Aug 2019 at 05:22, Greg <thenatureprogram(a)gmail.com> wrote:
>
> > Thanks again, Kerry. I am hoping that someone with access to more
> resources
> > (knowledge, support, etc) than I have will look into this.
> >
> > A few more thoughts/questions:
> >
> > 1. The link to the citation dataset from the Medium article ("What are
> the
> > ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
> > 2. As far as I can tell, every named author in the top ten most cited
> > sources on Wikipedia is male. One piece is by a working group
> > 3. This line from the Medium piece struck me: "Many of these publications
> > have been cited by Wikipedians across large series of articles using
> > powerful bots and automated tools."
> >
> > Are citations being added by bots? I'm not sure that I understand that
> line
> > correctly.
> >
> > Greg
> >
> >
> >
> >
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 25 Aug 2019 21:16:25 -0700
> From: Greg <thenatureprogram(a)gmail.com>
> To: wiki-research-l(a)lists.wikimedia.org
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
> <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa=
> vCPkzFbEysfGQqnVg(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Thanks, WSC. All very interesting.
>
> I've been thinking about Wiklpedia citations less in terms of kudos and
> more in terms of a feedback loop. The cited sources get a significant
> amount of attention (1 click per 200 pageviews is the number I saw
> recently). When I imagine total Wikipedia traffic, that's huge. How many
> students are finding sources this way? How many academics? And how many of
> these citations are finding their way back into academic publications via
> this mechanism?
>
> Assuming this is happening to some degree, the gender imbalance of the
> citations is also reflected. If the Wikipedia imbalance is the same as the
> one in academia, that's one thing; if it is better on Wikipedia than it is
> in academia, that's reason to celebrate; if the balance is worse, that's
> concerning. In fact, if the gender imbalance conforms to my fears instead
> of my hopes, and is magnified by the massive website traffic, I imagine it
> could even explain the growth in the citation disparity researchers note in
> their study of political science texts. (I link to that study in a previous
> post; it was mentioned in the Washington Post recently)
>
> There is a very real possibility that Wikipedia is making the citation
> gender gap worse. I think we need to understand what is happening and take
> immediate action if the news is not good.
>
> Greg
>
> >
> >
> >
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 26 Aug 2019 10:59:07 +0300
> From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>, Aaron Halfaker
> <ahalfaker(a)wikimedia.org>, Kerry Raymond <kerry.raymond(a)gmail.com>
> Subject: Re: [Wiki-research-l] sockpuppets and how to find them sooner
> Message-ID: <cf2734ff-d2cf-3108-691f-8ecf46125ed7(a)gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Please everyone avoid using jargon specific to the English Wikipedia on
> this cross-language and cross-wiki mailing list.
>
> Aaron Halfaker, 23/08/19 17:36:
> > I think embeddings[1] would be a nice way to create a signature.
>
> There is some discussion of acceptable user fingerprinting (presumably
> to be available to CheckUsers only), other than the usual over-reliance
> on IP addresses, in particular at
> <
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Ab…
> >.
>
> Federico
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 26 Aug 2019 10:17:46 +0200
> From: Jane Darnell <jane023(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
> <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH=
> f4M40faVNbKkqw(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Greg,
> Thanks for worrying. This is a known problem and yes, Wikipedia contributes
> to the Gendergap in citations and no, it's not an easy fix, since it is the
> fault of systemic bias in academia. So fewer women are head author on
> scientific publications, and it is generally only the head author that gets
> cited on Wikipedia. This is not just a problem with written works in the
> field of politics. I spend most of my time working on paintings and their
> documented catalogs, so generally I only notice and fix this problem in art
> catalogs. Women rarely appear as lead author mentioned. I will always add
> them in to descriptions when I add items for their works on Wikidata, but I
> can not always find them! Sometimes I can't even create items for them
> because all I have is a name and a work and nothing else available online
> anywhere. You see this most often with women who spent entire careers
> working at a single institution and the institution doesn't bother to
> promote their work or even list them in exhibition catalogs. With luck
> there might be a local obituary, but not always. If you have suggestions
> how to set up a Wikiproject to tackle this it would be a good idea. In my
> onwiki experience the Women-in-Red community can be very positive in their
> response to gendergap-related issues for women writers.
> Jane
>
> On Mon, Aug 26, 2019 at 6:17 AM Greg <thenatureprogram(a)gmail.com> wrote:
>
> > Thanks, WSC. All very interesting.
> >
> > I've been thinking about Wiklpedia citations less in terms of kudos and
> > more in terms of a feedback loop. The cited sources get a significant
> > amount of attention (1 click per 200 pageviews is the number I saw
> > recently). When I imagine total Wikipedia traffic, that's huge. How many
> > students are finding sources this way? How many academics? And how many
> of
> > these citations are finding their way back into academic publications via
> > this mechanism?
> >
> > Assuming this is happening to some degree, the gender imbalance of the
> > citations is also reflected. If the Wikipedia imbalance is the same as
> the
> > one in academia, that's one thing; if it is better on Wikipedia than it
> is
> > in academia, that's reason to celebrate; if the balance is worse, that's
> > concerning. In fact, if the gender imbalance conforms to my fears instead
> > of my hopes, and is magnified by the massive website traffic, I imagine
> it
> > could even explain the growth in the citation disparity researchers note
> in
> > their study of political science texts. (I link to that study in a
> previous
> > post; it was mentioned in the Washington Post recently)
> >
> > There is a very real possibility that Wikipedia is making the citation
> > gender gap worse. I think we need to understand what is happening and
> take
> > immediate action if the news is not good.
> >
> > Greg
> >
> > >
> > >
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 26 Aug 2019 11:45:09 +0300
> From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>, Greg
> <thenatureprogram(a)gmail.com>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID: <835202af-4653-641e-782e-c619458bdd7f(a)gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Greg, 22/08/19 06:19:
> > I do not know the current status of wikicite or if/when this
> > could be used for this inquiry--either to examine all, or a sensible
> subset
> > of the citations.
>
> If I see correctly, you still did not receive an answer on the data
> available.
>
> It's true that the Figshare item for
> <
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wik…>
>
> was deleted (I've asked about it on the talk page), but it's trivial to
> run https://pypi.org/project/mwcites/ and extract the data yourself, at
> least for citations which use an identifier.
>
> Some example datasets produced this way:
> https://zenodo.org/record/15871
> https://zenodo.org/record/55004
> https://zenodo.org/record/54799
>
> Once you extract the list of works, the fun begins. You'll need to
> intersect with other data sources (Wikidata, ORCID, other?) and account
> for a number of factors until you manage to find a subset of the data
> which has a sufficiently high signal:noise ratio. For instance you might
> need to filter or normalise by
> * year of publication (some year recent enough to have good data but old
> enough to allow the work to be cited elsewhere, be archived after
> embargos);
> * country or institution (some probably have better ORCID coverage);
> * field/discipline and language;
> * open access status (per Unpaywall);
> * number of expected pageviews and clicks (for instance using
> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> <https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases>;
>
> a link from 10k articles on asteroids or proteins is not the same as
> being the lone link from a popular article which is not the same as a
> link buried among a thousand others on a big article);
> * time or duration of the addition (with one of the various diff
> extraction libraries, content persistence data or possibly historical
> eventstream if such a thing is available).
>
> To avoid having to invent everything yourself, maybe you can reuse the
> method of some similar study, for instance the one on the open access
> citation advantage or one of the many which studied the gender imbalance
> of citations and peer review in journals.
>
> However, it's very possible that the noise is just too much for a
> general computational method. You might consider a more manual approach
> on a sample of relevant events, for instance the *removal* of citations,
> which is in my opinion more significant than the addition.* You might
> extract all the diffs which removed a citation from an article in the
> last N years (probably they'll be in the order of 10^5 rather than
> 10^6), remove some massive events or outliers, sample 500-1000 of them
> randomly and verify the required data manually.
>
> As usual it will be impossible to have an objective assessment of
> whether that citation was really (in)appropriate in that context
> according to the (English or whatever) Wikipedia guidelines. To test
> that too, you should replicate one of the various studies of the gender
> imbalance of peer review, perhaps one of those which tried to assess the
> impact of a double blind peer review system on the gender imbalance.
> However, because the sources are already published, you'd need to
> provide the agendered information yourself and make sure the
> participants perform their assessment in some controlled environment
> where they don't have access to any gendered information (i.e. where you
> cut them off the internet).
>
> How many years do you have to work on this project? :-)
>
> Federico
>
> (*) I might add a citation just because it's the first result a popular
> search engine gives me, after glancing at the abstract and maybe the
> journal home page; but if I remove an existing citation, hopefully I've
> at least assessed its content and made a judgement about it, apart from
> cases of mass removals for specific problems with certain articles or
> publication venues.
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 20
> ************************************************
>
Greetings!
I was looking for information about the gender balance of Wikipedia
citations and no one I've asked knows of any work on this topic. Do you?
I think this is an important question.
Here's what I've learned so far:
Wikipedia citations are currently in the form of text strings. There is
also an initiative to place citations in an annotated structured repository
(wikicite). I do not know the current status of wikicite or if/when this
could be used for this inquiry--either to examine all, or a sensible subset
of the citations.
My perspective is that understanding the gender balance is necessary and
urgent. The balance could be better, the same, or worse than the citation
balances we already know, and the scale of the effect is quite large.
Is this a line of inquiry that the wikimedia/wikicite community is
interested in pursuing? If so, what is the best way to get started? Does
the WMF have the resources and interest to look into this matter inhouse?
Thanks for your thoughts.
Greg