Dear members of the Analytics Team,
I am currently conducting research about the excludability of free
knowledge available on the Wikimedia projects as an example of a public
good. In order to calibrate the model, I need aggregate data on the page
views and edits by country and language.
After having carefully read Research:Data
<https://meta.wikimedia.org/wiki/Research:Data>, I was only able to find
data on page views by country and language, which would be enough to
calibrate the demand side of my model. So, is it possible to get aggregate
data on edits by country and language, which are similar to those on page
views available at WikiStats?
Thanks in advance.
Best regards,
Kiril Simeonovski
Hello everyone,
The next Research Showcase, focused on *Improving knowledge integrity in
Wikimedia projects*, will be live-streamed Wednesday, July 19, at 9:30 AM
PST / 16:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1689784256>.
The event is on the WMF Staff Calendar.
YouTube stream: https://youtube.com/live/_8DevIsi44s?feature=share
<https://www.google.com/url?q=https://youtube.com/live/_8DevIsi44s?feature%3…>
You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Assessment of Reference Quality on WikipediaBy *Aitolkyn Baigutanova, KAIST*In
this talk, I will present our research on the reliability of Wikipedia
through the lens of its references. I will primarily discuss our paper on
the longitudinal assessment of reference quality on English Wikipedia,
where we operationalize the notion of reference quality by defining
reference need (RN), i.e., the percentage of sentences missing a citation,
and reference risk (RR), i.e., the proportion of non-authoritative
references. I will share our research findings on two key aspects: (1) the
evolution of reference quality over a 10-year period and (2) factors that
affect reference quality. We discover that the RN score has dropped by 20
percent point, with more than half of verifiable statements now
accompanying references. The RR score has remained below 1% over the years
as a result of the efforts of the community to eliminate unreliable
references. As an extension of this work, we explore how community
initiatives, such as the perennial source list, help with maintaining
reference quality across multiple language editions of Wikipedia. We hope
our work encourages more active discussions within Wikipedia communities to
improve reference quality of the content.
- Paper: Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper,
Ai-Jou Chou, Miriam Redi, Changwook Jung, and Meeyoung Cha. 2023.
Longitudinal Assessment of Reference Quality on Wikipedia. In Proceedings
of the ACM Web Conference 2023 (WWW '23). Association for Computing
Machinery, New York, NY, USA, 2831–2839.
<https://dl.acm.org/doi/abs/10.1145/3543507.3583218>
Multilingual approaches to support knowledge integrity in WikipediaBy *Diego
Saez-Trumper & Pablo Aragón, Wikimedia Foundation*Knowledge integrity in
Wikipedia is key to ensure the quality and reliability of information. For
that reason, editors devote a substantial amount of their time in
patrolling tasks in order to detect low-quality or misleading content. In
this talk we will cover recent multilingual approaches to support knowledge
integrity. First, we will present a novel design of a system aimed at
assisting the Wikipedia communities in addressing vandalism. This system
was built by collecting a massive dataset of multiple languages and then
applying advanced filtering and feature engineering techniques, including
multilingual masked language modeling to build the training dataset from
human-generated data. Second, we will showcase the Wikipedia Knowledge
Integrity Risk Observatory, a dashboard that relies on a language-agnostic
version of the former system to monitor high risk content in hundreds of
Wikipedia language editions. We will conclude with a discussion of
different challenges to be addressed in future work.
- Papers:
Trokhymovych, M., Aslam, M., Chou, A. J., Baeza-Yates, R., & Saez-Trumper,
D. (2023). Fair multilingual vandalism detection system for Wikipedia.
arXiv e-prints, arXiv-2306. https://arxiv.org/pdf/2306.01650.pdfAragón, P.,
& Sáez-Trumper, D. (2021). A preliminary approach to knowledge integrity
risk assessment in Wikipedia projects. arXiv preprint arXiv:2106.15940.
Best,
Kinneret
--
Kinneret Gordon
Senior Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>