[Wikimedia Research Showcase] July 19 at 1630 UTC - Analytics

13 Jul 2023


      Hello everyone,
The next Research Showcase, focused on *Improving knowledge integrity in
Wikimedia projects*, will be live-streamed Wednesday, July 19, at 9:30 AM
PST / 16:30 UTC. Find your local time here
https://zonestamp.toolforge.org/1689784256.
The event is on the WMF Staff Calendar.
YouTube stream: https://youtube.com/live/_8DevIsi44s?feature=share
https://www.google.com/url?q=https://youtube.com/live/_8DevIsi44s?feature%3Dshare&sa=D&source=calendar&ust=1689620665057229&usg=AOvVaw1qtvb4ZkbOTQV7LddNBH8X
You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Assessment of Reference Quality on WikipediaBy *Aitolkyn Baigutanova, KAIST*In
this talk, I will present our research on the reliability of Wikipedia
through the lens of its references. I will primarily discuss our paper on
the longitudinal assessment of reference quality on English Wikipedia,
where we operationalize the notion of reference quality by defining
reference need (RN), i.e., the percentage of sentences missing a citation,
and reference risk (RR), i.e., the proportion of non-authoritative
references. I will share our research findings on two key aspects: (1) the
evolution of reference quality over a 10-year period and (2) factors that
affect reference quality. We discover that the RN score has dropped by 20
percent point, with more than half of verifiable statements now
accompanying references. The RR score has remained below 1% over the years
as a result of the efforts of the community to eliminate unreliable
references. As an extension of this work, we explore how community
initiatives, such as the perennial source list, help with maintaining
reference quality across multiple language editions of Wikipedia. We hope
our work encourages more active discussions within Wikipedia communities to
improve reference quality of the content.
- Paper: Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper,
   Ai-Jou Chou, Miriam Redi, Changwook Jung, and Meeyoung Cha. 2023.
   Longitudinal Assessment of Reference Quality on Wikipedia. In Proceedings
   of the ACM Web Conference 2023 (WWW '23). Association for Computing
   Machinery, New York, NY, USA, 2831–2839.
   https://dl.acm.org/doi/abs/10.1145/3543507.3583218
Multilingual approaches to support knowledge integrity in WikipediaBy *Diego
Saez-Trumper & Pablo Aragón, Wikimedia Foundation*Knowledge integrity in
Wikipedia is key to ensure the quality and reliability of information. For
that reason, editors devote a substantial amount of their time in
patrolling tasks in order to detect low-quality or misleading content. In
this talk we will cover recent multilingual approaches to support knowledge
integrity. First, we will present a novel design of a system aimed at
assisting the Wikipedia communities in addressing vandalism. This system
was built by collecting a massive dataset of multiple languages and then
applying advanced filtering and feature engineering techniques, including
multilingual masked language modeling to build the training dataset from
human-generated data. Second, we will showcase the Wikipedia Knowledge
Integrity Risk Observatory, a dashboard that relies on a language-agnostic
version of the former system to monitor high risk content in hundreds of
Wikipedia language editions. We will conclude with a discussion of
different challenges to be addressed in future work.
- Papers:
Trokhymovych, M., Aslam, M., Chou, A. J., Baeza-Yates, R., & Saez-Trumper,
D. (2023). Fair multilingual vandalism detection system for Wikipedia.
arXiv e-prints, arXiv-2306. https://arxiv.org/pdf/2306.01650.pdfArag%C3%B3n, P.,
& Sáez-Trumper, D. (2021). A preliminary approach to knowledge integrity
risk assessment in Wikipedia projects. arXiv preprint arXiv:2106.15940.
Best,
Kinneret
-- 

Kinneret Gordon

Senior Research Community Officer

Wikimedia Foundation https://wikimediafoundation.org/