Hi all,
Just a friendly reminder that we'll be starting in approximately 30
minutes.
On Mon, Oct 16, 2023 at 3:29 PM Kinneret Gordon <kgordon(a)wikimedia.org>
wrote:
Hi all,
The next Research Showcase, focused on *Data Privacy*, will be
live-streamed on Wednesday, October 18, at 9:30 AM PST / 16:30 UTC. Find
your local time here <https://zonestamp.toolforge.org/1697646641>.
YouTube stream:
https://www.youtube.com/watch?v=ntgRsMaDlsw. As usual,
you can join the conversation in the YouTube chat as soon as the showcase goes
live.
This month's presentations:
Wikipedia Reader Navigation: When Synthetic Data Is EnoughBy *Akhil
Arora, EPFL*Every day millions of people read Wikipedia. When navigating
the vast space of available topics using hyperlinks, readers describe
trajectories on the article network. Understanding these navigation
patterns is crucial to better serve readers’ needs and address structural
biases and knowledge gaps. However, systematic studies of navigation on
Wikipedia are hindered by a lack of publicly available data due to the
commitment to protect readers' privacy by not storing or sharing
potentially sensitive data. In this paper, we ask: How well can Wikipedia
readers' navigation be approximated by using publicly available resources,
most notably the Wikipedia clickstream data
<https://wikinav.toolforge.org/>? We systematically quantify the
differences between real navigation sequences and synthetic sequences
generated from the clickstream data, in 6 analyses across 8 Wikipedia
language versions. Overall, we find that the differences between real and
synthetic sequences are statistically significant, but with small effect
sizes, often well below 10%. This constitutes quantitative evidence for the
utility of the Wikipedia clickstream data as a public resource: clickstream
data can closely capture reader navigation on Wikipedia and provides a
sufficient approximation for most practical downstream applications relying
on reader data. More broadly, this study provides an example for how
clickstream-like data can generally enable research on user navigation on
online platforms while protecting users’ privacy.
How to tell the world about data you cannot show them: Differential
privacy at the Wikimedia FoundationBy *Hal Triedman, Wikimedia Foundation*The
Wikimedia Foundation (WMF), by virtue of its centrality on the internet,
collects lots of data about platform activities. Some of that data is made
public (e.g. global daily pageviews); other data types are not shared (or
are pseudonymized prior to sharing), largely due to privacy concerns.
Differential privacy is a statistical definition of privacy that has gained
prominence in academia, but is still an emerging technology in industry. In
this talk, I share the story of how we put differential privacy into
production at the WMF, through looking at the case study of geolocated
daily pageview counts.
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Best,
Kinneret
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>