Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
Hi everyone,
We are delighted to announce that Wiki Workshop 2020 will be held in
Taipei on April 20 or 21, 2020 (the date to be finalized soon) and as
part of the Web Conference 2020 [1]. In the past years, Wiki Workshop
has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San
Francisco.
You can read more about the call for papers and the workshops at
http://wikiworkshop.org/2020/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 17. All
other submissions should be received by February 21.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing you in Taipei.
Best,
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://www2020.thewebconf.org/
Hi everybody,
the Analytics team is going to enable Kerberos authentication for Hadoop on
Monday December 2nd. The procedure will start around 10 AM CET and will
hopefully last 3/4 hours, but since this is an invasive change there might
be a possibility that it will last more. If you have anything important
that requires Hadoop on this date please let us know in advance.
The most visible change from the user's point of view is the introduction
of a new account/password to be able to use the Hadoop services (like
Hive/HDFS/Spark/Oozie). We created a user guide about what will change with
kerberos in
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide.
There is also a task opened to track any doubt/question/special-use-cases
during the next two weeks: https://phabricator.wikimedia.org/T238560.
Feel free to reach out to IRC #wikimedia-analytics on Freenode too!
Thanks!
Luca (on behalf of the Analytics team)
FYI
---------- Forwarded message ---------
From: Ariel Glenn WMF <ariel(a)wikimedia.org>
Date: Wed, Nov 27, 2019 at 5:38 AM
Subject: [Wikitech-l] BREAKING CHANGE: schema update, xml dumps
To: Wikipedia Xmldatadumps-l <Xmldatadumps-l(a)lists.wikimedia.org>,
Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
We plan to move to the new schema for xml dumps for the February 1, 2020
run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means
that, for example, the commonswiki dump will contain MediaInfo information
as well as the usual wikitext. See
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/docs…
for the schema and
https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multi…
for further explanation and example outputs.
Phabricator task for the update: https://phabricator.wikimedia.org/T238972
PLEASE FORWARD to other lists as you deem appropriate. Thanks!
Ariel Glenn
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi everyone,
We’re preparing for the November 2019 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201911 and add your name next to any paper you are interested in covering. Our writing deadline is 27 November 23:59 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry below. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- A Forensic Qualitative Analysis of Contributions to Wikipedia from Anonymity Seeking Users
- All Talk: How Increasing Interpersonal Communication on Wikis May Not Enhance Productivity
- Analysis of Data Persistence in Collaborative Content Creation Systems: The Wikipedia Case
- Analyzing Wikipedia Deletion Debates with a Group Decision-Making Forecast Model
- Collaboration Drives Individual Productivity
- Does Sleep Deprivation Cause Online Incivility? Evidence from a Natural Experiment
- Extracting Literal Assertions for DBpedia from Wikipedia Abstracts
- How Does Editor Interaction Help Build the Spanish Wikipedia?
- Knowledge Graphs and Knowledge Networks: The Story in Brief
- Online Disinformation and the Role of Wikipedia
- Public Archaeology's Mammoth in the Room: Engaging Wikipedia as a Tool for Teaching and Outreach
- Revision Classification for Current Events in Dutch Wikipedia Using a Long Short-Term Memory Network
- The Dynamics of Peer-Produced Political Information During the 2016 U.S. Presidential Campaign
- The Roles Bots Play in Wikipedia
- Transforming Wikipedia into Augmented Data for Query-Focused Summarization
- Weakly Supervised Multilingual Causality Extraction from Wikipedia
- Wiktionary matcher
Masssly and Tilman Bayer
[1] http://meta.wikimedia.org/wiki/Research:Newsletter[2] WikiResearch (@WikiResearch) | Twitter
Hello,
As part of an AHRC research network, I conduct a survey about
Internet/media utopias.
In the time of the Cambridge Analytica scandal and fake news, we
experience a crisis of Internet platforms. Many people think we need
Internet and media utopias today. But how could they look like?
People interested in Wikipedia might have good ideas...
I want to invite you to participate:
https://psmutopias.limequery.net/879161
Answering will take about five minutes. A number of participants with
very visionary ideas will be invited to a workshop in 2020 in London,
where participants will work on co-writing/co-authoring an
Internet/Media Utopias Manifesto.
Kind regards, Christian Fuchs
--
Prof. Christian Fuchs
University of Westminster,
Director of the Communication and Media Research Institute
http://www.camri.ac.uk
@fuchschristian
Hi all,
The next Research Showcase will be live-streamed on Wednesday, November 20,
2019, at 9:30 AM PST/17:30 UTC. We’ll have a presentation from Martin
Potthast of Leipzig University on text reuse in Wikipedia and other
presentation from the Wikimedia Foundation’s Isaac Johnson on the
demographics and interests of Wikipedia’s readers.
YouTube stream: https://www.youtube.com/watch?v=tIko_V1k09s
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Wikipedia Text Reuse: Within and Without
By Martin Potthast, Leipzig University
We study text reuse related to Wikipedia at scale by compiling the first
corpus of text reuse cases within Wikipedia as well as without (i.e., reuse
of Wikipedia text in a sample of the Common Crawl). To discover reuse
beyond verbatim copy and paste, we employ state-of-the-art text reuse
detection technology, scaling it for the first time to process the entire
Wikipedia as part of a distributed retrieval pipeline. We further report on
a pilot analysis of the 100 million reuse cases inside, and the 1.6 million
reuse cases outside Wikipedia that we discovered. Text reuse inside
Wikipedia gives rise to new tasks such as article template induction,
fixing quality flaws, or complementing Wikipedia’s ontology. Text reuse
outside Wikipedia yields a tangible metric for the emerging field of
quantifying Wikipedia’s influence on the web. To foster future research
into these tasks, and for reproducibility’s sake, the Wikipedia text reuse
corpus and the retrieval pipeline are made freely available. Paper
<https://webis.de/publications.html#?q=wikipedia%20ecir%202019>, Demo
<https://demo.webis.de/wikipedia-text-reuse/>
Characterizing Wikipedia Reader Demographics and Interests
By Isaac Johnson, Wikimedia Foundation
Building on two past surveys on the motivation and needs of Wikipedia
readers (Why We Read Wikipedia
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#November_2016>; Why
the World Reads Wikipedia
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#December_2018>),
we examine the relationship between Wikipedia reader demographics and their
interests and needs. Specifically, we run surveys in thirteen different
languages that ask readers three questions about their motivation for
reading Wikipedia (motivation, needs, and familiarity) and five questions
about their demographics (age, gender, education, locale, and native
language). We link these survey results with the respondents' reading
sessions -- i.e. sequence of Wikipedia page views -- to gain a more
fine-grained understanding of how a reader's context relates to their
activity on Wikipedia. We find that readers have a diversity of backgrounds
but that the high-level needs of readers do not correlate strongly with
individual demographics. We also find, however, that there are
relationships between demographics and specific topic interests that are
consistent across many cultures and languages. This work provides insights
into the reach of various Wikipedia language editions and the relationship
between content or contributor gaps and reader gaps. See the meta page
<https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Be…>
for more details.
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Wikimedia Research Showcase [1] is almost six years old and we're
using the birthday opportunity to step back and reflect on the past,
celebrate the contributions by more than 70 speakers and many of you
who participated in the discussions, and plan for its future.
We would like to ask for your input as we're thinking about the future
of the Research Showcases. We want to hear from those of you who
participated in the showcases and/or watched them, as well as those of
you who decided this is not something for you. :) In order to gather
your input, we have put together a survey that we'd appreciate if you
participate in.
Link to survey (please note that the link will take you to Google
Forms [2]): https://docs.google.com/forms/d/e/1FAIpQLSecgn8cMu5IfTYRgn93bfOiJVEIL09RRf_…
Your contributions to this survey can help us in our thinking as we
move forward. Please submit your responses by 2019-11-22.
Sincerely,
Jonathan Morgan and Leila Zia
Research, Wikimedia Foundation
[1] https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
[2] If you want to participate but not through Google Forms, ping me
off-list and I'll send you a pdf file you can fill and send back to me
(it won't be anonymous though. sorry.). I'm not attaching it to this
email as some lists may put my email in the moderation queue with an
attachment. (And I don't /think/ I can upload it to Commons.)
The October 2019 issue of the Wikimedia Research Newsletter is out:
https://meta.wikimedia.org/wiki/Research:Newsletter/2019/October
In this issue:
1 Research presentations at Wikimania 20191.1 "All Talk: How Increasing Interpersonal Communication on Wikis May Not Enhance Productivity"1.2 "Despite the [Tor] ban: doing good work anonymously on Wikipedia"1.3 Discussion summarization tool to help with Requests for Comments (RfCs) going stale1.4 "Hidden Gems in the Wikipedia Discussions: The Wikipedians' Rationales"1.5 "Characterizing Reader Behavior on Wikipedia"1.6 Wikipedia citations (footnotes) are only clicked on one of every 200 pageviews1.7 "Dwelling on Wikipedia Investigating time spent by global encyclopedia readers"1.8 "Wikipedia graph mining dynamic structure of collective memory1.9 Harmful content rare on English Wikipedia1.10 "Sockpuppet detection in the English Wikipedia"1.11 "Wiki-Atlas: Rendering Wikipedia Content through Cartographic and Augmented Reality Mediums"1.12 "Evidence of Dark Matter: Assessing the Contribution of Subject-matter Experts to Wikipedia"1.13 Why Apple's Siri relies on data from Wikipedia infoboxes instead of (just) Wikidata1.14 "Discovering Implicational Knowledge in Wikidata"1.15 "Analyzing the evolution of wikis with WikiChron"1.16 "State of Wikimedia Research 2018-2019"2 Other events3 Other recent publications3.1 "Revealing the Role of User Moods in Struggling Search Tasks"3.2 Helping students find a research advisor, with Google Scholar and Wikipedia3.3 "Uncovering the Semantics of Wikipedia Categories"3.4 "Adapting NMT to caption translation in Wikimedia Commons for low-resource languages"3.5 "Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia"3.6 "Self Attentive Edit Quality Prediction in Wikipedia"3.7 "TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables"3.8 "Training and hackathon on building biodiversity knowledge graphs" with Wikidata3.9 "Spectral Clustering Wikipedia Keyword-Based Search Results"3.10 "Indigenous Knowledge for Wikipedia: A Case Study with an OvaHerero Community in Eastern Namibia"3.11 "On Persuading an OvaHerero Community to Join the Wikipedia Community"
*** 16 recent publications were covered or listed in this issue ***
Masssly and Tilman Bayer
---
Wikimedia Research Newsletterhttps://meta.wikimedia.org/wiki/Research:Newsletter/* Follow us on Twitter: @WikiResearch
* Like us on Facebook: Facebook.com/WikiResearch/
* Receive this newsletter by mail: Research-newsletter Mailing List - Wikimedia
Today we are releasing a new dataset meant to help us understand the impact
of grants and programs on editing. This data was requested several years
ago, and we took a long time to bring in the privacy and security experts
whose help we needed to release it. With that work done, you can download
the data here: https://dumps.wikimedia.org/other/geoeditors/ and read about
it here:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors/Pu…
You can send questions or comments on this thread or on the discussion page.