We are delighted to announce that Wiki Workshop 2020 will be held in
Taipei on April 20 or 21, 2020 (the date to be finalized soon) and as
part of the Web Conference 2020 . In the past years, Wiki Workshop
has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San
You can read more about the call for papers and the workshops at
http://wikiworkshop.org/2020/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 17. All
other submissions should be received by February 21.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing you in Taipei.
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
the Analytics team is going to enable Kerberos authentication for Hadoop on
Monday December 2nd. The procedure will start around 10 AM CET and will
hopefully last 3/4 hours, but since this is an invasive change there might
be a possibility that it will last more. If you have anything important
that requires Hadoop on this date please let us know in advance.
The most visible change from the user's point of view is the introduction
of a new account/password to be able to use the Hadoop services (like
Hive/HDFS/Spark/Oozie). We created a user guide about what will change with
There is also a task opened to track any doubt/question/special-use-cases
during the next two weeks: https://phabricator.wikimedia.org/T238560.
Feel free to reach out to IRC #wikimedia-analytics on Freenode too!
Luca (on behalf of the Analytics team)
We’re preparing for the November 2019 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201911 and add your name next to any paper you are interested in covering. Our writing deadline is 27 November 23:59 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry below. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- A Forensic Qualitative Analysis of Contributions to Wikipedia from Anonymity Seeking Users
- All Talk: How Increasing Interpersonal Communication on Wikis May Not Enhance Productivity
- Analysis of Data Persistence in Collaborative Content Creation Systems: The Wikipedia Case
- Analyzing Wikipedia Deletion Debates with a Group Decision-Making Forecast Model
- Collaboration Drives Individual Productivity
- Does Sleep Deprivation Cause Online Incivility? Evidence from a Natural Experiment
- Extracting Literal Assertions for DBpedia from Wikipedia Abstracts
- How Does Editor Interaction Help Build the Spanish Wikipedia?
- Knowledge Graphs and Knowledge Networks: The Story in Brief
- Online Disinformation and the Role of Wikipedia
- Public Archaeology's Mammoth in the Room: Engaging Wikipedia as a Tool for Teaching and Outreach
- Revision Classification for Current Events in Dutch Wikipedia Using a Long Short-Term Memory Network
- The Dynamics of Peer-Produced Political Information During the 2016 U.S. Presidential Campaign
- The Roles Bots Play in Wikipedia
- Transforming Wikipedia into Augmented Data for Query-Focused Summarization
- Weakly Supervised Multilingual Causality Extraction from Wikipedia
- Wiktionary matcher
Masssly and Tilman Bayer
 http://meta.wikimedia.org/wiki/Research:Newsletter WikiResearch (@WikiResearch) | Twitter
As part of an AHRC research network, I conduct a survey about
In the time of the Cambridge Analytica scandal and fake news, we
experience a crisis of Internet platforms. Many people think we need
Internet and media utopias today. But how could they look like?
People interested in Wikipedia might have good ideas...
I want to invite you to participate:
Answering will take about five minutes. A number of participants with
very visionary ideas will be invited to a workshop in 2020 in London,
where participants will work on co-writing/co-authoring an
Internet/Media Utopias Manifesto.
Kind regards, Christian Fuchs
Prof. Christian Fuchs
University of Westminster,
Director of the Communication and Media Research Institute
The next Research Showcase will be live-streamed on Wednesday, November 20,
2019, at 9:30 AM PST/17:30 UTC. We’ll have a presentation from Martin
Potthast of Leipzig University on text reuse in Wikipedia and other
presentation from the Wikimedia Foundation’s Isaac Johnson on the
demographics and interests of Wikipedia’s readers.
YouTube stream: https://www.youtube.com/watch?v=tIko_V1k09s
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
This month's presentations:
Wikipedia Text Reuse: Within and Without
By Martin Potthast, Leipzig University
We study text reuse related to Wikipedia at scale by compiling the first
corpus of text reuse cases within Wikipedia as well as without (i.e., reuse
of Wikipedia text in a sample of the Common Crawl). To discover reuse
beyond verbatim copy and paste, we employ state-of-the-art text reuse
detection technology, scaling it for the first time to process the entire
Wikipedia as part of a distributed retrieval pipeline. We further report on
a pilot analysis of the 100 million reuse cases inside, and the 1.6 million
reuse cases outside Wikipedia that we discovered. Text reuse inside
Wikipedia gives rise to new tasks such as article template induction,
fixing quality flaws, or complementing Wikipedia’s ontology. Text reuse
outside Wikipedia yields a tangible metric for the emerging field of
quantifying Wikipedia’s influence on the web. To foster future research
into these tasks, and for reproducibility’s sake, the Wikipedia text reuse
corpus and the retrieval pipeline are made freely available. Paper
Characterizing Wikipedia Reader Demographics and Interests
By Isaac Johnson, Wikimedia Foundation
Building on two past surveys on the motivation and needs of Wikipedia
readers (Why We Read Wikipedia
the World Reads Wikipedia
we examine the relationship between Wikipedia reader demographics and their
interests and needs. Specifically, we run surveys in thirteen different
languages that ask readers three questions about their motivation for
reading Wikipedia (motivation, needs, and familiarity) and five questions
about their demographics (age, gender, education, locale, and native
language). We link these survey results with the respondents' reading
sessions -- i.e. sequence of Wikipedia page views -- to gain a more
fine-grained understanding of how a reader's context relates to their
activity on Wikipedia. We find that readers have a diversity of backgrounds
but that the high-level needs of readers do not correlate strongly with
individual demographics. We also find, however, that there are
relationships between demographics and specific topic interests that are
consistent across many cultures and languages. This work provides insights
into the reach of various Wikipedia language editions and the relationship
between content or contributor gaps and reader gaps. See the meta page
for more details.
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Wikimedia Research Showcase  is almost six years old and we're
using the birthday opportunity to step back and reflect on the past,
celebrate the contributions by more than 70 speakers and many of you
who participated in the discussions, and plan for its future.
We would like to ask for your input as we're thinking about the future
of the Research Showcases. We want to hear from those of you who
participated in the showcases and/or watched them, as well as those of
you who decided this is not something for you. :) In order to gather
your input, we have put together a survey that we'd appreciate if you
Link to survey (please note that the link will take you to Google
Forms ): https://docs.google.com/forms/d/e/1FAIpQLSecgn8cMu5IfTYRgn93bfOiJVEIL09RRf_…
Your contributions to this survey can help us in our thinking as we
move forward. Please submit your responses by 2019-11-22.
Jonathan Morgan and Leila Zia
Research, Wikimedia Foundation
 If you want to participate but not through Google Forms, ping me
off-list and I'll send you a pdf file you can fill and send back to me
(it won't be anonymous though. sorry.). I'm not attaching it to this
email as some lists may put my email in the moderation queue with an
attachment. (And I don't /think/ I can upload it to Commons.)
The October 2019 issue of the Wikimedia Research Newsletter is out:
In this issue:
1 Research presentations at Wikimania 20191.1 "All Talk: How Increasing Interpersonal Communication on Wikis May Not Enhance Productivity"1.2 "Despite the [Tor] ban: doing good work anonymously on Wikipedia"1.3 Discussion summarization tool to help with Requests for Comments (RfCs) going stale1.4 "Hidden Gems in the Wikipedia Discussions: The Wikipedians' Rationales"1.5 "Characterizing Reader Behavior on Wikipedia"1.6 Wikipedia citations (footnotes) are only clicked on one of every 200 pageviews1.7 "Dwelling on Wikipedia Investigating time spent by global encyclopedia readers"1.8 "Wikipedia graph mining dynamic structure of collective memory1.9 Harmful content rare on English Wikipedia1.10 "Sockpuppet detection in the English Wikipedia"1.11 "Wiki-Atlas: Rendering Wikipedia Content through Cartographic and Augmented Reality Mediums"1.12 "Evidence of Dark Matter: Assessing the Contribution of Subject-matter Experts to Wikipedia"1.13 Why Apple's Siri relies on data from Wikipedia infoboxes instead of (just) Wikidata1.14 "Discovering Implicational Knowledge in Wikidata"1.15 "Analyzing the evolution of wikis with WikiChron"1.16 "State of Wikimedia Research 2018-2019"2 Other events3 Other recent publications3.1 "Revealing the Role of User Moods in Struggling Search Tasks"3.2 Helping students find a research advisor, with Google Scholar and Wikipedia3.3 "Uncovering the Semantics of Wikipedia Categories"3.4 "Adapting NMT to caption translation in Wikimedia Commons for low-resource languages"3.5 "Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia"3.6 "Self Attentive Edit Quality Prediction in Wikipedia"3.7 "TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables"3.8 "Training and hackathon on building biodiversity knowledge graphs" with Wikidata3.9 "Spectral Clustering Wikipedia Keyword-Based Search Results"3.10 "Indigenous Knowledge for Wikipedia: A Case Study with an OvaHerero Community in Eastern Namibia"3.11 "On Persuading an OvaHerero Community to Join the Wikipedia Community"
*** 16 recent publications were covered or listed in this issue ***
Masssly and Tilman Bayer
Wikimedia Research Newsletterhttps://meta.wikimedia.org/wiki/Research:Newsletter/* Follow us on Twitter: @WikiResearch
* Like us on Facebook: Facebook.com/WikiResearch/
* Receive this newsletter by mail: Research-newsletter Mailing List - Wikimedia
Summary: I have made changes to the introduction description of this
mailing list. This includes changes to what this mailing list is used
for (reflecting more closely the reality of how it's been used for the
past couple of years), more expanded information about other lists
that you may want to be aware of, the mailing list norms, and the
introduction of topics and tags. You can review the updated
description at https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
. As a participant of this mailing list, you are responsible to know
the description of the mailing list and I kindly ask you to review it.
==Who am I?==
I'm one of the admins of this list (and head of Research at Wikimedia
==Why the change?==
On a personal level: I want to be able to communicate with the
research community around Wikimedia projects more often, I don't want
to create a new list for my communications, and I saw the note about
the limits on the frequency of posts in the old description of this
list as something that "kept me out". Inspired by this need, I had a
more comprehensive look at the full description.
The previous version of the description  had a few areas to improve
based on the current needs and realities of this list. I name a couple
of them below:
* We had a note about the frequency of emails to this list to be kept
low. With the stack of technologies available to us today, we can
relax this condition and ask for people to use tags/topics in the
subject of their emails to allow for more emails to come to the list.
* The note on who can post to this list could be improved. It read
"only people who are actively involved in research on Wikimedia
projects should post to this list" while my understanding is that on
this list we want to welcome research related questions from those who
are not actively involved in research, too. For example, community
organizers should feel welcome to ask research-related questions on
==What process did I follow for this change?==
I wrote my proposed changes in an etherpad and sent it to 10 people.
These folks include the two other admins of this list, a couple of
other Research folks from WMF, a few folks from the Wikimedia Research
community, and two editors from the community who are active in the
research space. I heard back from 3 of these folks with thumbs up and
areas for improvement. I incorporated all the suggestions I received.
==What if you have suggestions for improvements?==
Sure. Bring them up on this thread or privately. Please note that I
will be in Wikidata Conference  and after that in our annual
Research Offsite and may not be able to respond to you until
==Where can I see the updated description?==
Please go to https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
to review it.
 Read the previous version of the description below:
The purpose of this mailing list is to discuss scientific research
into the content and the communities of the Wikimedia projects:
Wikipedia, Wiktionary, Wikibooks, Wikisource, Wikiquote, Wikinews,
Wikispecies, and Wikimedia Commons, Meta-Wiki.
Research into the technology of Wikimedia, MediaWiki, should primarily
be discussed on <a
instead. For content or community research projects with a strong
technological component, cross-posting to both lists may be advisable.
Please note that only people who are actively involved in research on
Wikimedia projects should post to this list. Typical on-topic posts
<UL><LI>announcement of a new research project
<LI>discussions of methodology
<LI>questions and answers about related projects
Mailing list traffic should be kept at a reasonably low level. The
list is softly moderated, and individuals posting off-topic material
repeatedly may be removed.
This list is not directly associated with the <a
Research Network</A>, though members of the Network are welcome to
post here if they are involved in research projects relating to
content or community. Internal Wikimedia matters, discussions of new
projects and similar threads should be kept off the list.
Principal Research Scientist, Head of Research