Hello research community!
My name is Hal Triedman, and I’m a senior privacy engineer at WMF. I work
to make data that WMF releases about reading, editing, and other on-wiki
behavior safer, more granular, and more accessible to the world using
differential
privacy <https://en.wikipedia.org/wiki/Differential_privacy>.
I’m writing today to share that WMF has started to release more granular,
differentially-private statistics about editing activity by country. This
data will be published on a monthly basis, and offers some distinct
improvements over the existing geoeditors public monthly
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors/Public>
dataset:
-
It provides exact counts of both editors and edits for each
country-project-activity level group, instead of bucketed counts of only
editors
-
It includes the “beginner editor” category (1-4 edits), instead of only
releasing the intermediate (5-99 edits) and advanced (100+ edits) editors
categories
-
It contains two distinct releases, one split by month (for continuity
with current metrics) and one split by week (for increased granularity),
instead of solely by month. My hope is that the increased granularity can
show the week-to-week effects of editathons, events, etc. 🙂
There are two distinct datasets, each released monthly as data becomes
available:
-
Monthly data: README
<https://analytics.wikimedia.org/published/datasets/geoeditors_monthly/00_README.html>
/ raw data
<https://analytics.wikimedia.org/published/datasets/geoeditors_monthly/>
-
Weekly data: README
<https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/00_README.html>
/ raw data
<https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/>
I am actively working on enabling API access to this data (and DP pageview
data). For now, I’ve built an example python notebook
<https://public-paws.wmcloud.org/User:HTriedman%20(WMF)/private_geoeditors_data_access.ipynb>
illustrating how one might access the data in its current tsv format, as
well as several different kinds of simple analyses that can be done with it.
I also want to invite the research community to join me for a deep dive
into DP at WMF at the October Research Showcase on data privacy. In the
meantime, please feel free to reach out with any questions on the project talk
page <https://meta.wikimedia.org/wiki/Talk:Differential_privacy>.
For more information about WMF’s work on differential privacy more
generally, see the differential privacy homepage on meta
<https://meta.wikimedia.org/wiki/Differential_privacy>. And be on the
look-out for more announcements of other DP datasets soon!
Best,
Hal Triedman (he/him)
Senior Privacy Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
metawiki <https://meta.wikimedia.org/wiki/User:HTriedman_(WMF)> • linkedin
<https://www.linkedin.com/in/hal-triedman/>
Show replies by date