[Wikimedia Research Showcase] March 16 - Wikimedia-l

10 Mar 2022


      Hi all,
The next Research Showcase will be live-streamed Wednesday, March 16 at
6:30AM PT / 13:30 UTC. Find your local time here:
https://zonestamp.toolforge.org/1647437436.
The theme is: Patterns and dynamics of article quality.
YouTube stream: https://www.youtube.com/watch?v=o5e6S7ac4q4
You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase.
The Showcase will feature the following talks:
Quality monitoring in Wikipedia - A computational perspectiveBy *Animesh
Mukherjee https://cse.iitkgp.ac.in/~animeshm/ (Indian Institute of
Technology, Kharagpur)*In this talk, I shall summarize our five-year long
research highlights concerning Wikipedia. In particular, I shall deep dive
into two of our recent works; while the first one attempts to understand
the early indications of which editors would soon go "missing" (aka missing
editors) [1], the second one investigates how the quality of a Wikipedia
article transitions over time and whether computational models could be
built to understand the characteristics of future transitions [2]. In each
case, I will present a suite of key results and the main insights that we
obtained thereof.[1] When expertise gone missing: Uncovering the loss of
prolific contributors in Wikipedia
https://link.springer.com/chapter/10.1007/978-3-030-91669-5_23, ICADL
2021 (pdf https://arxiv.org/pdf/2109.09979)[2] Quality Change: norm or
exception? Measurement, Analysis and Detection of Quality Change in
Wikipedia https://arxiv.org/abs/2111.01496, CSCW 2022 (pdf
https://arxiv.org/pdf/2111.01496)
Automatically Labeling Low Quality Content on Wikipedia by Leveraging
Editing BehaviorsBy *Sumit Asthana http://sumitasthana.xyz/ (University
of Michigan, Ann Arbor)*Wikipedia articles aim to be definitive sources of
encyclopedic content. Yet, only 0.6% of Wikipedia articles have high
quality according to its quality scale due to insufficient number of
Wikipedia editors and enormous number of articles. Supervised Machine
Learning (ML) quality improvement approaches that can automatically
identify and fix content issues rely on manual labels of individual
Wikipedia sentence quality. However, current labeling approaches are
tedious and produce noisy labels. In this talk, I will discuss an automated
labeling approach that identifies the semantic category (e.g., adding
citations, clarifications) of historic Wikipedia edits and uses the
modified sentences prior to the edit as examples that require that semantic
improvement. Highest-rated article sentences are examples that no longer
need semantic improvements. I will discuss the performance of models
training with this labeling approach over models trained with existing
labeling approaches, and also the implications of such a large scale semi
supervised labeling approach in capturing the editing practices of
Wikipedia editors and helping them improve articles faster.Related
paper: Automatically
Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing
Behaviors https://dl.acm.org/doi/10.1145/3479503, CSCW 2021 (pdf
https://arxiv.org/pdf/2108.02252)
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation