Hi all,
The next Research Showcase, focused on *Data Privacy*, will be
live-streamed on Wednesday, October 18, at 9:30 AM PST / 16:30 UTC. Find
your local time here <https://zonestamp.toolforge.org/1697646641>.
YouTube stream: https://www.youtube.com/watch?v=ntgRsMaDlsw. As usual, you
can join the conversation in the YouTube chat as soon as the showcase goes
live.
This month's presentations:
Wikipedia Reader Navigation: When Synthetic Data Is EnoughBy *Akhil Arora,
EPFL*Every day millions of people read Wikipedia. When navigating the vast
space of available topics using hyperlinks, readers describe trajectories
on the article network. Understanding these navigation patterns is crucial
to better serve readers’ needs and address structural biases and knowledge
gaps. However, systematic studies of navigation on Wikipedia are hindered
by a lack of publicly available data due to the commitment to protect
readers' privacy by not storing or sharing potentially sensitive data. In
this paper, we ask: How well can Wikipedia readers' navigation be
approximated by using publicly available resources, most notably the
Wikipedia clickstream data <https://wikinav.toolforge.org/>? We
systematically quantify the differences between real navigation sequences
and synthetic sequences generated from the clickstream data, in 6 analyses
across 8 Wikipedia language versions. Overall, we find that the differences
between real and synthetic sequences are statistically significant, but
with small effect sizes, often well below 10%. This constitutes
quantitative evidence for the utility of the Wikipedia clickstream data as
a public resource: clickstream data can closely capture reader navigation
on Wikipedia and provides a sufficient approximation for most practical
downstream applications relying on reader data. More broadly, this study
provides an example for how clickstream-like data can generally enable
research on user navigation on online platforms while protecting users’
privacy.
How to tell the world about data you cannot show them: Differential privacy
at the Wikimedia FoundationBy *Hal Triedman, Wikimedia Foundation*The
Wikimedia Foundation (WMF), by virtue of its centrality on the internet,
collects lots of data about platform activities. Some of that data is made
public (e.g. global daily pageviews); other data types are not shared (or
are pseudonymized prior to sharing), largely due to privacy concerns.
Differential privacy is a statistical definition of privacy that has gained
prominence in academia, but is still an emerging technology in industry. In
this talk, I share the story of how we put differential privacy into
production at the WMF, through looking at the case study of geolocated
daily pageview counts.
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Best,
Kinneret
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
Dear all,
As there was a recent press mention of Osama and Ziyad[1] (see "In the
Media" in the current Signpost issue) – does the WMF's Human Rights Team
(cc'ed) have any update on their situation?
Has anyone else heard any news? If I recall correctly, Osama had married
not long before being jailed in 2020 – has anyone been in touch with his
wife?
Is there anything the community can do?
Andreas
[1]
https://en.wikipedia.org/wiki/List_of_people_imprisoned_for_editing_Wikiped…
Hello everyone,
I am thrilled to share with you all that the Deoband Community Wikimedia
has started its Youtube channel <https://www.youtube.com/@dcwwiki>. We will
be uploading our monthly conversation hour recordings on this channel for
the wider community benefit.
Apart from the conversation hour recordings, we have plans to use this
channel to upload some educational stuff in the near future. We hope that
you all benefit from the conversations that we host every month, and ask
you to subscribe to our YouTube channel.
Best regards
Aafi
This message is being translated into other languages on Meta-wiki.You can
help with more languages.
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Community_Affairs_Comm…>
العربية • bahasa Indonesia • 中文 • Deutsch • español • français • Kiswahili
• polski • português do Brasil • українська
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Community_Affairs_Comm…>
Hi everyone,
Since joining the Wikimedia Foundation, I have tried to regularly send you
updates
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
here and elsewhere. I am mindful that this one arrives during a period of
compounded challenges across the world with escalating wars, conflict, and
climate reminding us each week that global volatility and uncertainty are
on the rise. I hope you’ll read this message to the end to join me and
Foundation leadership in conversations with each other at a time when I
feel we need to pull closer together.
How it started…
Two years ago this month, I began a listening and learning process
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>to
prepare for my official start at the Foundation. In individual
conversations with nearly 300 people from 55 countries, as well as numerous
community events, I asked questions about Wikimedia’s vision and mission,
what we believed the world needed from us now, and what challenges we faced
in achieving our goals. This led to five ‘puzzles’
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
that I believe continue to vex us.
I observed at the time that the only topic with unanimous consensus
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
was the urgent need for our work – which is true now more than ever before.
As mis/disinformation grows, with polarization and conflict intensifying
across societies globally, the Wikimedia projects remain committed to
principles of open knowledge and neutrality. There is no doubt about the
necessity and urgency of our contributions.
The world needs us to succeed, and this resonates for me even more in the
world we occupy today. While some areas of our community are focused on
what is happening around us, I believe we still aren’t united enough
against these and other common threats.
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…>
My conversations in 2021 were shaped by what volunteers thought about how
to make all contributions count
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>,
how to make our multilingualism
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
more of a superpower, and how to break the circular puzzle of managing
centralised institutions to support decentralised projects.
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
I especially valued reflections about how to close the gap between where we
are and where we need to be in building infrastructure that is human-led,
and strongly tech-enabled.
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
How it’s going…
Since then, I have been primarily focused on the Wikimedia Foundation’s own
performance and accountability – certainly to all of you, but also to our
readers, donors, regulators, and partners. My self-assessment at the
beginning of 2023 was that we were heading more in the right direction
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>:
the Foundation’s annual planning
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
is being guided by movement strategy and attempting to be more responsive
to volunteer needs, we have more than tripled the number of languages
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…>
we communicate in with regions around the world, and we have re-centered
product and technology priorities
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…>
to better support a rapidly changing knowledge ecosystem.
Annual goals should be defined clearly, and then delivered well. But our
work requires longer time horizons than yearly planning – certainly to
2030, which I believe requires asking much harder questions about
priorities, constraints, and trade-offs to pragmatically agree on what can
be achieved in the next seven years with a slowing growth of resources and
our current collaboration models.
The question of time horizons is also leading me to ask whether Wikipedia
will be a single-generation wonder or whether we know how to sustain
Wikimedia for the generations still to come. Our mission
<https://meta.wikimedia.org/wiki/Mission> calls for this work to continue
in perpetuity, and some aspects of our projects have created digital
imprints on the world that feel impossible to erase. But what does a
multi-generational view of Wikimedia require of us, from now? I believe
this is less about lofty statements than it is about working with
deliberate intent on issues that will help bring a multi-generational view
of our projects into clearer focus, now and into the future.
Some (not all) of the big questions…
I am not sure how we should do this, and I hope to hear what you think
(more on this below). For now, I have asked our Board of Trustees and
Foundation leadership to set multi-year planning goals
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Chief_Executive_Office…>
with three topics that feel like a useful place to start:
(1) The first relates to how the financial model of Wikimedia advances our
mission. Future projections indicate that, for a range of reasons,
fundraising online and through banners may not continue to grow at the same
rate as in past years. We have several long-term initiatives underway to
help mitigate this risk and also diversify our revenue streams.
-
An update was recently posted about the essential role of the Wikimedia
Endowment
<https://diff.wikimedia.org/2023/09/29/the-next-chapter-for-the-wikimedia-en…>
in growing long-term support for the projects.
-
In parallel, we continue to assess Wikimedia Enterprise
<https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Enterprise>’s
ability to improve the user experience of readers beyond our own websites
while simultaneously having very high-volume reuse companies financially
support our movement.
-
On the expense side
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…>,
we have responded by slowing the rate of growth for the Foundation itself,
while increasing financial resources that support other movement entities.
We need a long-term financial model that matches our aspirations to our
resources in order to implement plans effectively.
(2) The second topic is our product and technology priorities, which this
year focus on the technology needs of Wikimedia contributors (ranging from
those with extended rights, to newcomers, to institutional partners like
GLAM organizations). This ranges from:
-
Overarching objectives
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…>
outlined in the current annual plan, which were decided following a period
of community review
<https://meta.wikimedia.org/wiki/Talk:Wikimedia_Foundation_Annual_Plan/2023-…>,
as well as input provided by volunteers
<https://diff.wikimedia.org/2023/04/14/selenas-listening-tour/> directly
to the Foundation’s new Chief Product & Technology Officer, Selena
Deckelmann.
-
Progressing on important priorities raised by editors with extended
rights. This has been supported by software improvements for New Pages
Patrol
<https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/2023_Moderator_Tools_…>,
by developing a workflow in the Android app for patrolling edits
<https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/Android/Anti_Vandalism>,
by building a system that guides newcomers to make well-referenced edits
<https://www.mediawiki.org/wiki/Edit_check>, and by developing the
capability for each community to configure features
<https://www.mediawiki.org/wiki/Community_configuration_2.0> to fit
their own needs.
-
Increased support for Wikimedia Commons that included upgrades to Thumbor
<https://wikitech.wikimedia.org/wiki/Thumbor>, support for OpenRefine
<https://meta.wikimedia.org/wiki/OpenRefine#OpenRefine_for_Wikimedia_Commons>,
and other work
<https://commons.wikimedia.org/wiki/Commons:WMF_support_for_Commons/updates>.
Also migrating to the latest version of the Creative Commons license
<https://diff.wikimedia.org/2023/06/29/stepping-into-the-future-wikimedia-pr…>
on Wikipedia, in response to requests for a more human-readable and
internationally-friendly version of the free culture license used on
Wikipedia to make knowledge freely available.
-
Committing to review and improve the community wishlist process
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Future_Of_The_Wis…>
to better handle the needs of diverse users, growing technical
complexities, and deeper collaboration between the Foundation’s Product &
Technology teams and technical volunteers. The wishlist survey results of
2022
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Results>
and 2023
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2023/Results>
show how we are bringing together skills and expertise most relevant for
the requests. For instance, while the Comm-Tech team delivered Better
diff handling for paragraph split
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Better_diff_…>
– a set of iterative features for improvement in a core editing experience,
one of the most sought after features – dark mode for reading
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2023/Reading/Dark…>
– is now being taken up by the Web team. We will continue more
collaboration to build a flexible and sustainable technical wish request
response system.
-
Complying with growing regulatory and legal obligations in our role as
technical host of the Wikimedia projects, which includes additional
requirements of the Digital Services Act categorizing Wikipedia as a
Very Large Online Platform (VLOP)
<https://diff.wikimedia.org/2023/05/04/wikipedia-is-now-a-very-large-online-…>
and responding to other forthcoming regulations to
<https://medium.com/wikimedia-policy/the-uk-online-safety-bill-is-harmful-to…>
advocate for Wikipedia's unique model of community self-governance.
-
Advancing community conversations about generative artificial
intelligence, while also modernizing our machine learning infrastructure to
support mission-aligned ML tool use on our projects. This has included
experimenting with whether and how we can serve reliable, verifiable
knowledge via off-platform AI assistants like ChatGPT
<https://meta.wikimedia.org/wiki/Future_Audiences/Experiments:_conversationa…>.
We have also experimented with how machine learning might be used to help
smaller wiki communities automatically moderate incoming edits
<https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator>.
-
Better supporting underserved languages with open machine translations
through a new translation service – MinT
<https://diff.wikimedia.org/2023/06/13/mint-supporting-underserved-languages…>
– that supports over 200 languages, including 44 that have machine
translation for the first time.
-
Dedicating more resources to maintenance and support of MediaWiki
software while we begin thinking together about the future roadmap.
-
As a part of our commitment to knowledge equity, adding a new caching
center in South America for increased site responsiveness in the region.
-
Expanding the capabilities of campaign and event organizers by improving
the Event Registration capability
<https://meta.wikimedia.org/wiki/Campaigns/Foundation_Product_Team/Registrat…>
and starting on ways for organizers to spread the word about their
campaigns
<https://meta.wikimedia.org/wiki/Campaigns/Foundation_Product_Team/Event_Dis…>
.
-
Introducing improvements for readers like the ability to customize their
own reading experience through dark mode and control over font size
<https://www.mediawiki.org/wiki/Reading/Web/Accessibility_for_reading>.
-
Experimenting with how we can share free knowledge with global youth and
invite them into our projects on rich-media apps
<https://diff.wikimedia.org/2023/07/13/exploring-paths-for-the-future-of-fre…>
where they like to spend time (e.g., TikTok, Instagram Reels).
-
Prioritizing safety features that range from the first versions of
an Incident
Reporting System
<https://meta.wikimedia.org/wiki/Incident_Reporting_System>, so that
editors can intuitively report harassment, to establishing a
community-driven enforcement system for the Universal Code of Conduct to
increase safety and inclusion for participants across all the Wikimedia
projects.
(3) Finally, we are evaluating principles for defining the Foundation's
core roles and responsibilities. As a movement that is built on the
strength of crowdsourcing, what is the best division of labor to achieve
our goals? This is intended to support movement charter deliberations, and
also to directly identify challenges in our decision-making and governance
structures:
-
What would be the purpose of creating additional entities now, and can
we repurpose or close down other entities to achieve these goals?
-
Are there roles the Foundation should stop playing or let others lead?
-
How can we speed up technical decision-making?
-
How do we ensure that global contributors, whether organized into
affiliate structures or not, can voice their perspectives and needs
efficiently and effectively?
-
Where and how should we continue to evaluate, iterate, and adapt to the
changing needs of our movement and the world around us?
Of course this is not a comprehensive list of all the issues that need to
be addressed. It is a starting point for key topics that I believe need a
longer view. You will see in the next section that I am asking for your
help in figuring out how we progress from here.
Invite to Talking: 2024…
Alongside learning about Wikimedia’s work in these last two years, I have
also focused a lot of my energy on learning about our ways of working
together. Fulfilling our mission calls for more human interactions, on- and
off-wiki, that are designed to create more shared understanding, and
hopefully grow trust. The return of in-person gatherings has been essential
for a subset of our volunteers, providing spaces for reconnecting,
recharging and working through difficult issues together in the same room.
Foundation leadership has also been working harder to share organizational
news and have individualized conversations on-wiki and in other digital
forums.
The goal is to put more effort and intentionality into communicating the
right information, at the right time, and in the right way, even knowing
that we can never meet everyone's expectations.
It is also important for us to talk to each other throughout the year –
formally and informally. To support this, over the next few months I am
asking our Trustees and my colleagues at the Wikimedia Foundation to join
me in a different kind of listening tour: more of a two-way dialogue that
is designed to listen intently to what is on your minds now, and to also
share progress and ideas about our multi-year planning.
We can spend time learning from each other in the context of:
-
consequential events taking place in 2024 (e.g., critical elections
<https://en.wikipedia.org/wiki/List_of_elections_in_2024>, a movement
charter, enforcement of the Universal Code of Conduct, compliance
requirements for the Digital Services Act, and how to respond to lawmakers
around the world who are concerned about the impact of digital technologies
on society);
-
our annual plan supporting 2030 movement strategy objectives of
knowledge equity and knowledge as a service;
-
longer-range questions to secure our projects for generations yet to
come;
-
and anything else on your mind!
I hope you will decide to participate. You can sign up
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Community_Affairs_Comm…>
for on- and off-wiki options between October and February, including
individual conversations with Trustees, me, and other Foundation leaders. These
discussions are intended to improve deliberations at the Board’s strategic
planning retreat next March, and a summary of what we heard will be shared
with everyone in advance of March.
The Wikimedia Foundation's value to listening and sharing with more
curiosity
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Values#We_listen_and_s…>
will shape how we show up, and I hope everyone who is interested in
participating will bring the same approach.
As always, I welcome your feedback either on my talk page
<https://meta.wikimedia.org/wiki/Special:MyLanguage/User_talk:MIskander-WMF>
or emailing me directly at miskander(a)wikimedia.org.
Maryana
Maryana Iskander
Wikimedia Foundation CEO
Dear Wikimedians,
The 2023 edition of the Wikipedia Pages Wanting Photos campaign [1] ended on the 31st of August 2023 and we are pleased to announce the results.
A total of 35,623 Wikipedia articles in 179 languages were improved with photo, audio and video files. About 345 users participated in the campaign from more than 25 countries and the following users emerged as top contributors in the campaign's three prize categories.
Category A (Top image users) * 1st Prize: User:Nikolina Šepić (Serbia) * 2nd Prize: User:Aderiqueza (Nigeria) * 3rd Prize: User:Timzy D'Great (Nigeria)Category B (Top audio user) * Audio Prize: User:Tupungato (Poland)Category C (Top video user) * Video Prize: User:Mashkawat.ahsan (Bangladesh)
We congratulate the winners and thank them for their contributions to Wikimedia and for promoting the use of media files uploaded to Commons on Wikipedia articles. We also thank all participants, local organizers, members of the campaign’s international team and the jury for their contributions towards the successful completion of this project.
[1] https://meta.wikimedia.org/wiki/Wikipedia_Pages_Wanting_Photos_2023
Kind regards,
Ammarpad
On behalf of the Wikipedia Pages Wanting Photos Campaign Team.
Good evening.
I am pleased to share with you that WikiDonne's Board elected
<https://meta.wikimedia.org/wiki/WikiDonne/Consiglio_Direttivo/Elezioni/2023>
after
the 28 September 2023 votations took office yesterday.
The following positions have been confirmed:
- Camelia Boban (User:Camelia.boban), President
- Maria Antonietta Cima (User:Beatrice), Vice President and Secretary
- Marina Patriarca (User:Patmari), Treasurer
- Lorenza Colicigno (User:Lorenza Colicigno), Councilor
Good wikiwork to all,
Camelia Boban, President
--
*Camelia Boban (she/her)*
*| Java EE Developer |*
Wikimedia,
Hello. After receiving and listening to the feedback from our previous discussion, I have revised the Wikianswers proposal: https://meta.wikimedia.org/wiki/Wikianswers . I would like to also call your attention to its technical discussion section: https://meta.wikimedia.org/wiki/Wikianswers#Technical_discussion . A current version of this section is available below.
Per the feedback, the revised proposal includes, in addition to an option for a sister project at a new domain, e.g., https://en.wikianswers.org , an option for integration into the search systems of Wikipedia, Wikidata, and Commons. With respect to this latter option, AI systems' (LLMs') responses to end-users' questions would still be URL-addressed, human-editable content, e.g.: https://en.wikipedia.org/qa/2b106ea8-4d1b-441f-9dc8-4555a9999ae9 .
Thank you for checking out the revised proposal and for any feedback.
Technical discussion
Overview
Relevant artificial intelligence topics include retrieval-augmented generation, retrieval-augmented generation with guardrails, and agent-based approaches.
As presently considered, those parts of the question-and-answer data which could be human-editable include: (1) the template of the prompts, (2) the task, (3) the retrieved context data, (4) the questions, and (5) the answers.
The template is the overall structure of the prompts to the LLM. It includes some natural language and slots where the other parts will be placed. This should be locked so as to be editable only by administrators. Editing this would invalidate every cached and unlocked answer, meaning that every unlocked answer would be updated, refreshed, or regenerated.
The task is an instruction, e.g., "You are a helpful system which will answer the user's question using the following information". This should be locked so as to be editable only by administrators. Editing this would invalidate every dependent cached and unlocked answer, meaning that every unlocked answer would be updated, refreshed, or regenerated.
The retrieved context data are chunks or excerpts, e.g., of Wikipedia articles, which enhance the answering of a particular question. Users could edit them, resulting in the cascading invalidations of dependent cached and unlocked answers. With respect to user experiences, editors might click on these displayed chunks or excerpts of content to navigate to them as they occurred in source pages and edit them there, these updates to the underlying pages resulting in updates to the chunks and dependent unlocked answers.
The questions would be unusual to edit, except in the cases of typographical errors.
The answers, abstractly, result from processing the other ingredients. These could be edited by humans but, as shown above, they could be subsequently revised by the system per cascading updates, refreshes, or regenerations. In some cases, editors might want to edit an answer and then to lock it from subsequent revisions by the system.
In conclusion, as presently considered, users would ordinarily tend to want to edit the retrieved chunks of content drawn from Wikipedia pages, these chunks augmenting the prompts to the LLMs, the cascading of these page revisions updating dependent unlocked answers automatically.
Database schemas
Wikianswers database schemas would include one or more tables with vector columns for embedding vectors. A project goal, then, would be to efficiently combine into a database schema the existing concepts of revision tables, page tables, and text tables with the newer concepts of embedding vectors and vector databases. Relevant tools include pgvector, a database extension which provides open-source vector-similarity search to PostgreSQL.
URL-addressability
Instead of requiring a new domain, e.g., https://en.wikianswers.org/ , Wikianswers features could be integrated into the search systems of Wikipedia, Wikidata, and Commons. In this case, human-editable responses could still be URL-addressable, e.g.: https://en.wikipedia.org/qa/2b106ea8-4d1b-441f-9dc8-4555a9999ae9 .
Datetime encoding
Some questions have impermanent answers and others are volatile, meaning that their answers could vary each time that the question was asked. In these regards, date and time data could be encoded into URLs in a human-readable manner, e.g., https://en.wikipedia.org/qa/2023/09/21/21/29/00/2b106ea8-4d1b-441f-9dc8-455… . Some questions and answers might involve different granularities of time. For example, a natural-language question "Which teams are in the Super Bowl?" might have a number of URLs, one for each year, e.g., https://en.wikipedia.org/qa/2022/40a7338d-fe75-4897-aee6-ec87141020a6 and https://en.wikipedia.org/qa/2021/40a7338d-fe75-4897-aee6-ec87141020a6 .
User experience
In the approach where Wikianswers features are integrated into Wikipedia, Wikidata, and Commons search, user experiences could utilize the existing text search boxes atop pages. Perhaps the "magnifying glass" icon in those search boxes could be accompanied by a "question mark" icon. One of these two icons would be selected, or activated, by end-users. Which such icon was activated would toggle between using the existing keyword-based content search and the described Wikianswers human-editable question-answering subsystem. Still under consideration is whether and how end-users could specify whether they desire for their question to have their current page, or selections thereof, as focal when responding to their question.
Best regards,
Adam Sobieski