Hello research community!
My name is Hal Triedman, and I’m a senior privacy engineer at WMF. I work to make data that WMF releases about reading, editing, and other on-wiki behavior safer, more granular, and more accessible to the world using differential privacy https://en.wikipedia.org/wiki/Differential_privacy.
I’m writing today to share that WMF has started to release more granular, differentially-private statistics about editing activity by country. This data will be published on a monthly basis, and offers some distinct improvements over the existing geoeditors public monthly https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors/Public dataset:
-
It provides exact counts of both editors and edits for each country-project-activity level group, instead of bucketed counts of only editors -
It includes the “beginner editor” category (1-4 edits), instead of only releasing the intermediate (5-99 edits) and advanced (100+ edits) editors categories -
It contains two distinct releases, one split by month (for continuity with current metrics) and one split by week (for increased granularity), instead of solely by month. My hope is that the increased granularity can show the week-to-week effects of editathons, events, etc. 🙂
There are two distinct datasets, each released monthly as data becomes available:
-
Monthly data: README https://analytics.wikimedia.org/published/datasets/geoeditors_monthly/00_README.html / raw data https://analytics.wikimedia.org/published/datasets/geoeditors_monthly/ -
Weekly data: README https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/00_README.html / raw data https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/
I am actively working on enabling API access to this data (and DP pageview data). For now, I’ve built an example python notebook https://public-paws.wmcloud.org/User:HTriedman%20(WMF)/private_geoeditors_data_access.ipynb illustrating how one might access the data in its current tsv format, as well as several different kinds of simple analyses that can be done with it.
I also want to invite the research community to join me for a deep dive into DP at WMF at the October Research Showcase on data privacy. In the meantime, please feel free to reach out with any questions on the project talk page https://meta.wikimedia.org/wiki/Talk:Differential_privacy.
For more information about WMF’s work on differential privacy more generally, see the differential privacy homepage on meta https://meta.wikimedia.org/wiki/Differential_privacy. And be on the look-out for more announcements of other DP datasets soon!
Best,
Hal Triedman (he/him)
Senior Privacy Engineer
Wikimedia Foundation https://wikimediafoundation.org/
metawiki https://meta.wikimedia.org/wiki/User:HTriedman_(WMF) • linkedin https://www.linkedin.com/in/hal-triedman/