Hello Analytics,
The Data Engineering team will start the deployment[1] of the changes that
will support the Temp Accounts
<https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts>
initiative in the Data Lake
<https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake> starting
today Wednesday January 22nd 2025.
These changes are not activating the Temp Accounts feature in any of the
wikis, but rather enabling support for Temp Accounts in the Hadoop Data
Lake.
It is expected that some MediaWiki related Data Lake tables[2] might be
temporarily unavailable during the following couple of days.
By the end of this process MediaWikiHistory tables and other derivative
tables will fully support Temp Accounts new semantics and data.
As part of the deployment process we plan to re-run the jobs for the
2024-12 snapshot.
This means the data model for that snapshot will be updated.
The changes are mostly backwards compatible, except for:
- The mediawiki_user_history table's `anonymous` field will be renamed
to `is_anonymous`.
- The geoeditors_edits_monthly table's `editors_are_anonymous` field
will be renamed to `users_are_anonymous`.
- The MediaWikiHistory dumps will have some new fields inserted, and the
order of the existing fields will change.
We haven't found any existing code (within the WMF) that could break due to
these non-backwards compatible changes, but if you find any, please let us
know.
[1] Deployment plan
<https://docs.google.com/document/d/1-GhyLepEL7rqJlY1a2RKQ_1YI2QYgVpFSmzpq9n…>
[2] List of affected tables
- wmf.mediawiki_history
- wmf.mediawiki_user_history
- wmf.mediawiki_page_history
- wmf.mediawiki_history_reduced
- wmf.edit_hourly
- wmf.editors_daily
- wmf.unique_editors_by_country
- wmf.geoeditors_edits_monthly
- wmf.geoeditors_monthly
- wmf.geoeditors_public_monthly
--
*Marcel Ruiz Forns** (he/him)*
Senior Software Engineer
Hi everyone,
It's a new year and we have some fascinating research showcases lined up!
The first one will be live-streamed next Wednesday, January 22, at 9:30 AM
PT / 17:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1737567000>. The theme for this showcase is
*Reader Attention and Curiosity*.
You are welcome to watch via the YouTube stream:
https://www.youtube.com/live/gvF8p4r91NE. As always, you can join the
conversation in the YouTube chat as soon as the showcase goes live.
This month's presentations:
Collective Attention Across Wikipedia and the WebBy *Patrick Gildersleve,
University of Exeter*Wikipedia, as one of the most popular websites
globally, serves as an important indicator of collective attention online.
Readers of news and social media often turn to Wikipedia as a secondary
resource for supporting or clarifying information, and this is reflected in
the patterns of page views and edits on the online encyclopaedia. Wikipedia
is also not just a vast repository of information; it is a network of
interconnected articles that exists within the broader ecosystem of the
World Wide Web. To fully comprehend the dynamics of online popularity, we
must study how individuals navigate between articles and how external
platforms drive traffic to Wikipedia, not just Wikipedia articles (or
alternative online records) in isolation. In this talk, I will review
research on how major news events spark networked surges of collective
attention to Wikipedia articles, how Twitter users both navigate and
contribute to Wikipedia in response to viral social media content, and how
we can combine data from Reddit and Wikipedia to study patterns of
attention towards current events, influxes of traffic from social media
towards Wikipedia, and the use of Wikipedia in discussions on social
media.Architectural
styles of curiosity in global Wikipedia mobile app readershipBy *Dale Zhou,
University of California, Irvine*A historico-philosophical examination of
texts over two millennia previously revealed three styles of curiosity: the
wandering “busybody”, the targeted “hunter,” and the creative “dancer.” In
this talk, I will review network signatures of these three styles from an
analysis of 482,760 readers using Wikipedia’s mobile app in 14 languages
from 50 countries or territories. By measuring the structure of knowledge
networks constructed by readers weaving a thread through articles in
Wikipedia, we expand upon prior work in the laboratory that found evidence
for distinct knowledge network architectures constructed by each curiosity
style. Moreover, we found associations, globally, between the structure of
knowledge networks and population-level indicators of spatial navigation,
education, mood, well-being, and inequality. This presentation will
describe how these findings advance our understanding of Wikipedia’s global
readership and demonstrate how cultural and geographical properties of the
digital environment relate to different styles of curiosity.
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi everyone,
Wiki Workshop, the largest Wikimedia research event of the year now in its
12th edition, will take place as a standalone virtual event on May 21-22,
2025 [1]. <https://wikiworkshop.org/2024/>
The call for papers for the workshop is now open [2]: We would like to
invite you to submit your 2-page extended abstracts. All submissions are
non-archival, which means you can submit ongoing, completed, and already
published works. See below for the full call.
Deadline for submission is March 9, 2025 (23:59 AoE).
If you have questions about the workshop, please don't hesitate to reach
out to us at wikiworkshop(a)googlegroups.com
Best,
Martin, on behalf of the organizers
[1] https://meta.wikimedia.org/wiki/Wiki_Workshop_2025
[2] https://meta.wikimedia.org/wiki/Wiki_Workshop_2025/Call_for_Papers
---
Call for papers
Workshop PC Chairs:
Martin Gerlach (Wikimedia Foundation)
Matthew Vetter (Indiana University of Pennsylvania)
We invite contributions to the Research Track of the 12th edition of Wiki
Workshop, which will take place virtually on May 21-22, 2025 as a 2-day
standalone event.
The Wiki Workshop is the largest Wikimedia research event of the year,
aimed at bringing together researchers who study all aspects of Wikimedia
projects (including, but not limited to, Wikipedia, Wikidata, Wikimedia
Commons, Wikisource, and Wiktionary) as well as Wikimedia developers,
affiliate organizations, and volunteer editors. Co-organized by the
Wikimedia Foundation’s Research team and members of the Wikimedia research
community, the workshop provides a direct pathway for exchanging ideas
between the organizations that serve Wikimedia projects and the researchers
actively studying them.
Building on the successful experiences of organizing Wiki Workshop in 2015
<https://wikiworkshop.org/2015>, 2016 <https://wikiworkshop.org/2016>, 2017
<https://wikiworkshop.org/2017>, 2018 <https://wikiworkshop.org/2018>, 2019
<https://wikiworkshop.org/2019>, 2020 <https://wikiworkshop.org/2020>, 2021
<https://wikiworkshop.org/2021>, 2022 <https://wikiworkshop.org/2022>, 2023
<https://wikiworkshop.org/2023>, 2024 <https://wikiworkshop.org/2024/> and
based on feedback from authors and participants over the years, this year’s
Research Track is organized as follows:
-
Submissions are non-archival, meaning we welcome ongoing, completed, and
already published work.
-
We accept submissions in the form of 2-page extended abstracts.
-
Authors of accepted abstracts will be invited to present their research
in a pre-recorded oral presentation, with dedicated time for live Q&A on
the days of the event.
-
Accepted abstracts will be shared on the website prior to the event.
Important Dates
-
Submission deadline: March 9, 2025 (23:59 AoE
<https://en.wikipedia.org/wiki/Anywhere_on_Earth>)
-
Author notification: April 14, 2025
-
Final version due: April 30, 2025 (23:59 AoE
<https://en.wikipedia.org/wiki/Anywhere_on_Earth>)
-
Workshop date: May 21-22, 2025
Submission Instructions
Similar to previous editions, this year’s Wiki Workshop solicits extended
abstracts (PDF format, maximum 2 pages). Submissions that exceed the 2-page
limit will be automatically rejected. Authors may include 1 additional page
containing references, figures, and/or tables (including captions) only.
Initial submissions require names and affiliations of authors, 5 keywords,
a title, an abstract, and a main text outlining the contribution, methods,
findings, and impact of the work, whichever is relevant. Submissions will
be non-archival and, as a result, may have already been published, under
review, or ongoing research. All submissions will be reviewed by multiple
members of the Wiki Workshop Program Committee. The names of the authors
will be revealed to the reviewers, whereas reviewers will remain anonymous
to the authors.
Please review our Privacy Statement
<https://foundation.wikimedia.org/wiki/Legal:Wiki_Workshop_Privacy_Statement>
before submitting your abstract to OpenReview.
-
Template for submissions
<https://gitlab.wikimedia.org/repos/research/wikiworkshop-templates>
-
Submission site
<https://openreview.net/group?id=wikimedia.org/Wiki_Workshop/2025/Research_T…>
on OpenReview
Topics
Wiki Workshop aims to have a broad technical program inclusive of many
academic fields and disciplines. Topics include, but are not limited to:
-
Use of bots, algorithms, and crowdsourcing methods for content curation,
sourcing, or verification of content and structured data;
-
Innovative uses of Wikipedia and other Wikimedia projects for AI and NLP
applications;
-
Approaches to develop AI-assisted workflows to support editors in
content moderation, patrolling, and maintenance;
-
Community health questions including sentiment analysis, harassment
detection, and tools that enhance community harmony;
-
Dynamics of participation, including activation, retention, and
attrition of various Wikimedia users and audiences;
-
Strategies and models to engage new editors through improvements to
onboarding experience;
-
Understanding the motivations, engagement models, incentives, and needs
of Wikimedia editors, readers, and developers of Wikimedia projects;
-
Approaches to discussions, consensus-building, and conflict resolution
in editorial decision-making;
-
Investigation in content bias and knowledge gaps, and strategies for
addressing them on Wikimedia projects;
-
Examination of content reuse dynamics within and beyond Wikimedia
projects;
-
New technologies and initiatives to grow content, quality, equity,
diversity, and participation across Wikimedia projects;
-
Innovative use of AI models to support editors in identifying and
automating repetitive tasks that could be easily automated (such as
copyediting);
-
Techniques for detecting low-quality, promotional, or fake content
(misinformation or disinformation), and identifying fake accounts or bad
actors (e.g., sock puppets);
-
Exploration of diverse source incorporation into Wikimedia projects,
such as oral histories, video, and others;
-
Understanding and improving the representation of “local content”
(geography, cultural context, or history) relevant to different communities;
-
Multilingual and multimodal analysis of Wikimedia projects;
-
Strategies for leveraging Wikimedia projects in media literacy
interventions;
-
Impact assessments of Wikimedia-based educational initiatives;
-
Policies, guidelines, and norms influencing the governance of Wikimedia
projects;
-
Privacy, security, and trust related to content creation, maintenance,
and consumption;
-
Understanding peer production mechanisms of Wikimedia projects;
-
The interplay between Wikimedia projects and the broader (open)
knowledge ecosystem including interactions with other online platforms;
-
Innovative uses of Wikimedia projects as indicators for real-world
events, cultural trends, technological or scientific advancements, and
beyond;
-
Open-source research code, datasets, and tools supporting
Wikimedia-related research.
--
Martin Gerlach (he/him) | Senior Research Scientist | Wikimedia Foundation