Hi all,
The next Research Showcase will be live-streamed this Wednesday, June 26,
at 11:30 AM PST/19:30 UTC. We will have three presentations this showcase,
all relating to Wikipedia blocks.
YouTube stream: https://www.youtube.com/watch?v=WiUfpmeJG7E
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Trajectories of Blocked Community Members: Redemption, Recidivism and
Departure
By Jonathan Chang, Cornell University
Community norm violations can impair constructive communication and
collaboration online. As a defense mechanism, community moderators often
address such transgressions by temporarily blocking the perpetrator. Such
actions, however, come with the cost of potentially alienating community
members. Given this tradeoff, it is essential to understand to what extent,
and in which situations, this common moderation practice is effective in
reinforcing community rules. In this work, we introduce a computational
framework for studying the future behavior of blocked users on Wikipedia.
After their block expires, they can take several distinct paths: they can
reform and adhere to the rules, but they can also recidivate, or
straight-out abandon the community. We reveal that these trajectories are
tied to factors rooted both in the characteristics of the blocked
individual and in whether they perceived the block to be fair and
justified. Based on these insights, we formulate a series of prediction
tasks aiming to determine which of these paths a user is likely to take
after being blocked for their first offense, and demonstrate the
feasibility of these new tasks. Overall, this work builds towards a more
nuanced approach to moderation by highlighting the tradeoffs that are in
play.
Automatic Detection of Online Abuse in Wikipedia
By Lane Rasberry, University of Virginia
Researchers analyzed all English Wikipedia blocks prior to 2018 using
machine learning. With insights gained, the researchers examined all
English Wikipedia users who are not blocked against the identified
characteristics of blocked users. The results were a ranked set of
predictions of users who are not blocked, but who have a history of conduct
similar to that of blocked users. This research and process models a system
for the use of computing to aid human moderators in identifying conduct on
English Wikipedia which merits a block.
Project page:
https://meta.wikimedia.org/wiki/University_of_Virginia/Automatic_Detection_…
Video: https://www.youtube.com/watch?v=AIhdb4-hKBo
First Insights from Partial Blocks in Wikimedia Wikis
By Morten Warncke-Wang, Wikimedia Foundation
The Anti-Harassment Tools team at the Wikimedia Foundation released the
partial block feature in early 2019. Where previously blocks on Wikimedia
wikis were sitewide (users were blocked from editing an entire wiki),
partial blocks makes it possible to block users from editing specific pages
and/or namespaces. The Italian Wikipedia was the first wiki to start using
this feature, and it has since been rolled out to other wikis as well. In
this presentation, we will look at how this feature has been used in the
first few months since release.
--
Janna Layton (she, her)
Administrative Assistant - Audiences & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Is the SuggestBot still in use in Wikipedia?
Are there similar task routing tools that have been deployed in Wikipedia?
Where in Wikipedia the use of such tools or bots was documented?
Thanks,
Haifeng Zhang
Call For Papers
1st International Workshop on Approaches for Making Data Interoperable
(AMAR 2019)
https://events.tib.eu/amar2019/
co-located with SEMANTiCS 2019
September 09 – 12, 2019 Karlsruhe, Germany
Submission deadline: July 9, 2019
------------------------------------------------------------------------------------------------------------------------
Overview
------------------------------------------------------------------------------------------------------------------------
Recently, there has been a rapid growth in the amount of data available on
the Web. Data is produced by different communities working in a wide range
of domains, using several techniques. This way a large volume of data in
different formats and languages is generated. Accessibility of such
heterogeneous and multilingual data becomes an obstacle for reuse due to
the incompatibility of data formats and the language gap. This
incompatibility of data formats impedes the accessibility of data sources
to the right community. For instance, most of open domain question
answering systems are developed to be effective when data is represented in
RDF. They can not operate with data in the very common CSV files or
presented in unstructured formats. Usually, the data they draw from is in
English rendering them unable to answer questions e.g. in Spanish. On the
other hand, NLP applications in Spanish cannot make use of a knowledge
graph in English. Different communities have different requirements in
terms of data representation and modeling. It is crucial to make the data
interoperable to make it accessible for a variety of applications.
------------------------------------------------------------------------------------------------------------------------
Topics of Interest
------------------------------------------------------------------------------------------------------------------------
We invite paper submissions from two communities: (i) data consumers and
(ii) data providers. This includes practitioners, such as data scientists,
that have experience in fitting the data available to their use case;
Semantic Web researchers, that have been investigating data reuse from
heterogeneous data in tools; researchers in the field of data linking and
translation; and other researchers working on the general field of data
integration.
We invite submissions from the following communities:
-
Data Integration
-
Multilingual Data
-
Data Linking
-
Ontology and Knowledge Engineering
We welcome original contributions about all topics related to data
interoperability, including but not limited to:
-
Approaches to convert data between formats, languages, and schema
-
Best practices for processing heterogeneous data
-
Translation of different language data
-
Cross-lingual applications
-
Recommendations for language modeling in linked data
-
Labeling of data with natural language information
-
Datasets for different communities’ data needs
-
Tools reusing different data formats
-
Converting datasets between different formats
-
Applications in different domains, e.g., Life Sciences, Scholarly,
Industry 4.0, Humanities
------------------------------------------------------------------------------------------------------------------------
Author Instructions
------------------------------------------------------------------------------------------------------------------------
Paper submission this workshop will be via EasyChair (
https://easychair.org/conferences/?conf=amar2019). The papers should follow
the Springer LNCS format, and be submitted in PDF on or before July 9, 2019
(midnight Hawaii time).
We accept papers of the following formats:
-
Full research papers (8 - 12 pages)
-
Short research papers (3 - 5 pages)
-
Position papers (6 - 8 pages)
-
Resource papers (8 - 12 pages, including the publication of the dataset)
-
In-Use papers (6 - 8 pages)
Accepted papers will be published as CEUR workshop proceedings. We target
the creation of a special issue including the best papers of the workshop.
------------------------------------------------------------------------------------------------------------------------
Important Dates
------------------------------------------------------------------------------------------------------------------------
Submission: July 9, 2019
Notification: July 30, 2019
Workshop: September 9, 2019
------------------------------------------------------------------------------------------------------------------------
Workshop Organizers
------------------------------------------------------------------------------------------------------------------------
Lucie-Aimée Kaffee, University of Southampton, UK & TIB Leibniz Information
Centre for Science and Technology, Hannover, Germany
Kemele M. Endris, TIB Leibniz Information Centre for Science and Technology
and L3S Research Centre University of Hannover, Germany
Maria-Esther Vidal, TIB Leibniz Information Centre for Science and
Technology and and L3S Research Centre University of Hannover, Germany
Please contact us, if you have any questions
------------------------------------------------------------------------------------------------------------------------
Program Committee
------------------------------------------------------------------------------------------------------------------------
Jeremy Debattista, Trinity College Dublin
Irlan Grangel, Bosch Corporate Research
Lydia Pintscher, Wikimedia Deutschland (Wikidata)
Alokkumar Jha, Insight Centre for Data Analytics
Amrapali Zaveri, Maastricht University
Maribel Acosta, Karlsruhe Institute of Technology
Manolis Koubarakis, National and Kapodistrian University of Athens
Elena Montiel Ponsoda, Universidad Politécnica de Madrid
Javier D. Fernández, Vienna University of Economics and Business
Diego Collarana, Enterprise Information System (EIS)
Elena Demidova, L3S Research Center
David Chaves Fraga, Universidad Politécnica de Madrid
Jose M. Gimenez Garcia, Universite Jean-Monnet
Julia Bosque Gil, Universidad Politécnica de Madrid
--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
Hi all,
This might be a known fact already.
Does it take less time (on average) for an editor to identify a vandalistic edit when using counter-vandalism tools, e.g., Huggle or STiki? If so, what features of these tools support such decision?
Thanks for your time,
Haifeng Zhang
Hello all,
I am new to this mailing list, and a new researcher so apologies if this is not the right mailing list for this question, but I hope you might be able to help me.
I am trying to recreate the map included below, with Wikipedia edits per 10,000 internet users but with newer data, I was hoping year 2018 data that I can then average per month, although it doesn’t have to be a Calendar year I suppose, it could be Feb-2018 to Feb-2019.
The key is that I would like newer data. I have looked at the sources of these maps, and they all seem to end in 2013 or 2014. This source got me close, but has no Edit data, only Viewing data.
https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountry…
Could anyone help me in finding the right dataset to recreate this map? (Image too heavy to be sent in email)
Like this image:
https://i0.wp.com/geonet.oii.ox.ac.uk/wp-content/uploads/sites/46/2016/09/W…
From this article:
https://www.oii.ox.ac.uk/blog/the-geography-of-wikipedia-edits/
I’m very grateful for any help you can offer me.
Very best,
Adam
Hi everybody,
as part of https://phabricator.wikimedia.org/T225306 I need to reboot the
an-coord1001 host, that runs the Hive server/metastore and Oozie. Tomorrow
June 26th I'll reboot the host at around 9 AM CEST, the maintenance window
should last 10/15 minutes more or less. This means that hive jobs might
fail during that timeframe, please let me know if it is a problem.
Thanks in advance,
Luca (on behalf of the Analytics team)
Hi,
We’re preparing for the June 2019 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201906 and add your name next to any paper you are interested in covering. Our target publication date is on 29 June 23:59 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry below. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- Assessing The Factual Accuracy of Generated Text
- Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia
- Gender and deletion on Wikipedia
- Improving Knowledge Base Construction from Robust Infobox Extraction
- Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics
- Neural Based Statement Classification for Biased Language
- People Who Can Take It: How Women Wikipedians Negotiate and Navigate Safety
- Predicting Economic Development using Geolocated Wikipedia Articles
- SEDTWik: Segmentation-based Event Detection from Tweets Using Wikipedia
- StRE: Self Attentive Edit Quality Prediction in Wikipedia
- TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables
- Training and hackathon on building biodiversity knowledge graphs
- Uncertainty During New Disease Outbreaks in Wikipedia
- Using Wiktionary as a resource for WSD : the case of French verbs
- Wikidata and the biodiversity knowledge graph
- Wikidata: Recruiting the Crowd to Power Access to Digital Archives
- WikiDataSets : Standardized sub-graphs from WikiData
Mohammed S. Abdulai and Tilman Bayer
[1] http://meta.wikimedia.org/wiki/Research:Newsletter[2]https://twitter.com/WikiResearch
Dear folks,
Are there studies that have examined what might affect edit size (e.g., # of words add/delete/modify in each revision). I am especially interested in the impact of editor's tenure/experience.
Thanks,
Haifeng Zhang
Hello everyone,
We have completed an interview study to learn about the values of Wikipedia
stakeholders around the ORES ecosystem. You can find the full study
description here
<https://meta.wikimedia.org/wiki/Research:Applying_Value-Sensitive_Algorithm…>
.
After analyzing our interview data, we were interested to find that all
stakeholders' values seem to converge on five major values around how
algorithms ought to operate on Wikipedia:
1.
Algorithmic systems should reduce the effort of community maintenance
work.
2.
Algorithmic systems should maintain human judgement as the final
authority.
3.
Algorithmic systems should support the workflows of individual people
with different priorities at different times.
4.
Algorithmic systems should encourage positive engagement with diverse
editor groups, such as newcomers, females, and minorities.
5.
Algorithmic systems should establish the trustworthiness of both people
and algorithms within the community.
We are inviting everyone to share feedback about our interpretation of the
data by reviewing the preliminary results of our study. Please leave
comments in this google doc
<https://docs.google.com/document/d/17AByGDxS2n9Cfon6vtgDGO3lUu0lQhe3eWNdAVZ…>,
or reply directly to the thread. We also posted about this at Village Pump
<https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)#Share_your…>.
Thanks,
Estelle (aka FauxNeme on Wikipedia)
--
*C. Estelle Smith *
*Graduate Research Fellow*
*University of Minnesota, Department of Computer Science*
*Keller Hall, 200 Union St., SE**Minneapolis, MN 55455*
*Cell: 612.226.7789 | **Twitter: @memyselfandHCI*
*https://colleenestellesmith.com/ <https://colleenestellesmith.com/>*
*Pronouns: she/her/hers*