Hi everyone,
We’re preparing for the June 2020 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN202006 and add your name next to any paper you are interested in covering. Our target publication date is Target publication time is 28 June 15:59 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry below. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
- RuBQ: A Russian Dataset for Question Answering over Wikidata
- SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
- The effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia
- The impact of event type and geographical proximity on threat appraisal and emotional reactions to Wikipedia articles
- A protocol for adding knowledge to Wikidata, a case report
- A Quantitative Portrait of Wikipedia's High-Tempo Collaborations during the 2020 Coronavirus Pandemic
- Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia
- COVID-19 research in Wikipedia
- Sudden Attention Shifts on Wikipedia Following COVID-19 Mobility Restrictions
- How do academic topics shift across altmetric sources? A case study of the research area of Big Data
- Language Models as FactCheckers?
- The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic
- Wikidata as a knowledge graph for the life sciences
- Wikipedia in Vascular Surgery Medical Education: Comparative Study
Masssly and Tilman Bayer
[1] http://meta.wikimedia.org/wiki/Research:Newsletter[2] WikiResearch (@WikiResearch) | Twitter
Hi all,
I’m wondering if anybody here can help me - I'm looking for some sort of
tool that is able to get a batch of users (in this case, users who
participated in a Wiki Loves Monuments contest) and trace whether they are
still active or the date of the last time they used their account.
Does anybody have any suggestions, please?
Thanks!
Neville
--
Wikimedia Community Malta <http://www.wikimalta.org>
Hi all,
The next Research Showcase will be live-streamed on Wednesday, June 17, at
9:30 AM PDT/16:30 UTC.
In the era of 'information explosion,' we strive to stay informed and
relevant often too quickly, and hence run into the peril of consuming false
or distorted facts. This month, our invited speakers will help us
understand these dynamics, especially in the context of Wikipedia's content
and readership. First, Connie will talk about an initiative she's been
leading to source and rank credible information from the news, and its
overlap with Wikipedia. In the second talk, Tiziano will present his recent
work on quantifying and understanding how the readers of Wikipedia interact
with an article's citations to verify specific claims.
YouTube stream: https://www.youtube.com/watch?v=GS9Jc3IFhVQ
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Today’s News, Tomorrow’s Reference, and The Problem of Information
Reliability - An Introduction to NewsQ
By: Connie Moon Sehat, NewsQ, Hacks/Hackers
The effort to make Wikipedia more reliable is related to the larger
challenges facing the information ecosystem overall. These challenges
include the discovery of and accessibility to reliable news amid the
transformation of news distribution through platform and social media
products. Connie will present some of the challenges related to the ranking
and recommendation of news that are addressed by the NewsQ Initiative, a
collaboration between the Tow-Knight Center for Entrepreneurial Journalism
at the Craig Newmark Graduate School of Journalism and Hacks/Hackers. In
addition, she’ll share some of the ways that the project intersects with
Wikipedia, such as supporting research around the US Perennial Sources list
(https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources
).
Related resources
-
NewsQ Initiative site (https://newsq.net/)
-
DUE JUNE 15 (Please apply if interested!): Social Science Research
Council Call for Papers, “News Quality in the Platform Era”
https://www.ssrc.org/programs/component/media-democracy/news-quality-in-the…
-
M. Bhuiyan, A. Zhang, C. Sehat, T. Mitra, 2020. Investigating "Who" in
the Crowdsourcing of News Credibility, C+J 2020 (
https://cpb-us-w2.wpmucdn.com/express.northeastern.edu/dist/d/53/files/2020…
)
Quantifying Engagement with Citations on Wikipedia
By: Tiziano Piccardi, EPFL
Wikipedia, the free online encyclopedia that anyone can edit, is one of the
most visited sites on the Web and a common source of information for many
users. As an encyclopedia, Wikipedia is not a source of original
information, but was conceived as a gateway to secondary sources: according
to Wikipedia's guidelines, facts must be backed up by reliable sources that
reflect the full spectrum of views on the topic. Although citations lie at
the very heart of Wikipedia, little is known about how users interact with
them. To close this gap, we built client-side instrumentation for logging
all interactions with links leading from English Wikipedia articles to
cited references for one month and conducted the first analysis of readers'
interaction with citations on Wikipedia. We find that overall engagement
with citations is low: about one in 300 page views results in a reference
click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched
observational studies of the factors associated with reference clicking
reveal that clicks occur more frequently on shorter pages and on pages of
lower quality, suggesting that references are consulted more commonly when
Wikipedia itself does not contain the information sought by the user.
Moreover, we observe that recent content, open access sources, and
references about life events (births, deaths, marriages, etc) are
particularly popular. Taken together, our findings open the door to a
deeper understanding of Wikipedia's role in a global information economy
where reliability is ever less certain, and source attribution ever more
vital.
Paper: https://arxiv.org/abs/2001.08614
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
I was wondering what's the best way to get a mapping between Wikidata QIDs
and Google Knowledge Graph (ex-Freebase) MIDs.
We tried to extract the mapping from Wikidata, using the property
https://www.wikidata.org/wiki/Property:P646, among others, but it doesn't
seem complete, with mappings for only 1.3 million entities -- much less
than the number of Wikidata entities.
Is there maybe a dedicated dataset or API for this mapping?
Alternatively, does anyone know of a way of querying the Google KG API [1]
directly with the name of a Wikipedia article (or with a Wikidata QID),
rather than an arbitrary plain-text string?
Thanks!
Bob
[1] https://developers.google.com/knowledge-graph
The May 2020 issue of the Wikimedia Research Newsletter is out:
https://meta.wikimedia.org/wiki/Research:Newsletter/2020/May
In this issue:
1 Automatic detection of undisclosed paid editing2 Wikiworkshop 20202.1 "A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines"2.2 "Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph"2.3 "WikiHist.html: English Wikipedia's Full Revision History in HTML Format"2.4 "Collaboration of Open Content News in Wikipedia: The Role and Impact of Gatekeepers"2.5 "Domain-Specific Automatic Scholar Profiling Based on Wikipedia"2.6 "Matching Ukrainian Wikipedia Red Links with English Wikipedia’s Articles"2.7 "Beyond Performing Arts: Network Composition and Collaboration Patterns"2.8 "Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia"2.9 "The Positioning Matters: Estimating Geographical Bias in the Multilingual Record of Biographies on Wikipedia"2.10 "Citation Detective: a Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale"3 Briefly
- *** 12 recent publications were covered or listed in this issue ***
Masssly and Tilman Bayer
---
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/
* Follow us on Twitter: @WikiResearch
* Like us on Facebook: Facebook.com/WikiResearch/
* Receive this newsletter by mail: Research-newsletter Mailing List - Wikimedia
Dear all,
As several Wikipedia researchers are working toward understanding
Wikipedia's role during this pandemic, I thought I'd share our recent paper
(abstract below):
"Sudden Attention Shifts on Wikipedia Following COVID-19 Mobility
Restrictions"
https://arxiv.org/abs/2005.08505
Stay safe, everyone on this list!
Bob
------------------
Authors:
Manoel Horta Ribeiro, Kristina Gligorić, Maxime Peyrard, Florian Lemmerich,
Markus Strohmaier, Robert West
Abstract:
We study how the coronavirus disease 2019 (COVID-19) pandemic, alongside
the severe mobility restrictions that ensued, has impacted information
access on Wikipedia, the world's largest online encyclopedia. A
longitudinal analysis that combines pageview statistics for 12 Wikipedia
language editions with mobility reports published by Apple and Google
reveals a massive increase in access volume, accompanied by a stark shift
in topical interests. Health- and entertainment- related topics are found
to have gained, and sports- and transportation- related topics, to have
lost attention. Interestingly, while the interest in health-related topics
was transient, that in entertainment topics is lingering and even
increasing. These changes began at the time when mobility was restricted
and are most pronounced for language editions associated with countries, in
which the most severe mobility restrictions were implemented, indicating
that the interest shift might be caused by people's spending more time at
home. Our results highlight the utility of Wikipedia for studying reactions
to the pandemic across the globe, and illustrate how the disease is
rippling through society.
Hello,
Due to a lot of free time these days I started a personal research project
on gender bias in contributors to the French-language Wikipedia.
My goal is to explore the relation between contributor genders and the
people they create articles about. The hypotheses are:
1- contributors predominantly write biographies of people with the same
gender. Simplistically: men write about men; women write about women.
2- there are a lot fewer female contributors than male ones. This has been
studied in the past but AFAIK we don’t have recent numbers and they are
all on the English-language WP.
If these two hypotheses are true, this could explain part of the problem
with gender bias in biographies.
What I’m struggling with –And I guess some people before me did as well on
the English-language WP– is the very low level of information we have on
contributors’ genders: on WP:FR, 60-70% of contributors have not changed
their gender in their user settings.
Does anyone have any pointer on this?
More insights below:
Looking at the contributors with ≥500 edits, 2.4% are auto-declared as
female; 27.4% as male; 70.2% as 'unknown' (undeclared).
By definition, there’s no apparent way to know the approximate gender
repartition of the undeclared-gender accounts.
The French-language Wikipedia shows male- and unknown-gender user pages
with the 'Utilisateur:' prefix while the female-gender user pages use the
'Utilisatrice:' prefix. Based on this, one would assume that women would
be more inclined toward declaring their gender so that the interface would
stop misgendering them. However, we know that female users tend to
under-declare their gender to protect themselves.
I assumed that older accounts would be more inclined toward having a
declared gender but that’s not the case: >60% of accounts of all ages
(except the very old ones but the sample is very small) have not declared
their gender, see:
https://commons.wikimedia.org/wiki/File:Gender_repartition_of_Le_Bistro_WP-…
Some users have user boxes on their user page with various info. Some of
them declare their gender. Surprisingly however, most of the users with
these boxes have not declared their gender in their preferences.
Out of the 434 users with a "I’m a woman" user box on their page, only
32% are auto-declared as female. Same ratio for the 2773 "I’m a man" users:
only 34% are auto-declared as male. It goes up to 36 % for the "I’m a
lesbian" box (N=14) and 40% for the "I’m a gay" one (N=86).
As I expected, predominantly-male professions have a larger male population
in their box usage, but still an even larger 'unknown' population:
Out of the 640 "I’m an engineer" box users, 24% self-declared as 'male' and
1% as 'female'. For the 714 "I’m a computers person", that’s 27.7% and 0.6%.
However some boxes where I wouldn’t expect a large bias have one as well.
The Babel Italian users are 18% male and 2% female (N=2885). The Esperanto
ones are 24.5% male and 0.8% female (N=493).
There is certainly a bias in box usage: newer users tend to use them a lot
less than older users, and I would assume users who talk about themselves
with boxes don’t have the same profile as the average contributor.
Thanks,
--
Baptiste Fontaine
This might be of interest to some Research and Education folks too.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
Date: Mon, May 25, 2020 at 7:22 PM
Subject: [Wikimedia-l] Language Showcase, May 2020
To: wikimedia-l <wikimedia-l(a)lists.wikimedia.org>
Hello,
This is an announcement about a new installment of the Language Showcase, a
series of presentations about various aspects of language diversity and its
connection to Wikimedia Projects.
This new installment will deal with the latest design research about the
upcoming section translation feature for Content Translation.
This session is going to be broadcast over Zoom, and a recording will be
published for later viewing. You can also participate in the conversation
on IRC or with us on the Zoom meeting.
Please read below for the event details, including local time, joining
links and do let us know if you have any questions.
Thank you!
Amir
== Details ==
# Event: Language Showcase #5
# When: May 27, 2020 (Wednesday) at 13:00 UTC (check local time
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200527T1300 )
# Where:
Join Zoom Meeting
https://wikimedia.zoom.us/j/97081030000
Meeting ID: 970 8103 0000
IRC - #wikimedia-office (on Freenode)
# Agenda:
The latest design research about the upcoming section translation feature
for Content Translation.
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Hi all,
The Research team at the Wikimedia Foundation has officially started a
new Formal Collaboration
<https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations>
with the *Institute of Basic Science* (IBS) from South Korea to work
collaboratively on *Discovering content inconsistencies between
Wikidata and Wikipedia *
<https://meta.wikimedia.org/wiki/Research:Discovering_content_inconsistencie…>
as part of the *Knowledge Integrity program*
<https://research.wikimedia.org/knowledge-integrity.html>.
Here are a few pieces of information about this collaboration that we
would like to share with you:
* We aim to keep the research documentation for this project in the
corresponding research page on meta
<https://meta.wikimedia.org/wiki/Research:Discovering_content_inconsistencie…>.
* Meeyoung Cha from IBS & KAIST and her collaborators Cheng-Te Li and
Yi-Ju Lu from the National Cheng Kung University (Taiwan) and Jing Ma
from Hong Kong Baptist University, will be contributing to this
project. We are thankful to them for agreeing to spend their time and
expertise on this project in the coming 3 months and to those of you
who have already worked with us as we were shaping the proposal for
this project and are planning to continue your contributions to this
program.
* I act as the point of contact for this research in the Wikimedia
Foundation. Please feel free to reach out to me (directly, if it
cannot be shared publicly) if you have comments or questions about the
project.
Best,
*Diego Sáez TrumperResearch Scientist
User:Diego_(WMF) <https://meta.wikimedia.org/wiki/User:Diego_(WMF)> *