🎉🎉🎉 Congrats on this release! Looking forward to using it in some projects 😀
--
Nate
Hal Triedman htriedman@wikimedia.org writes:
Hello world!
My name is Hal Triedman, and I’m a senior privacy engineer at WMF. I work to make data that WMF releases about reading, editing, and other on-wiki behavior safer, more granular, and more accessible to the world using differential privacy <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Differential_priva... >.
Today I’m reaching out to share that WMF has released almost 8 years (from 1 July 2015 to present) of privatized pageview data <https://urldefense.com/v3/__https://diff.wikimedia.org/2023/06/21/new-datase... >, partitioned by country, project, and page. This data is significantly more granular than other datasets we release, and should help researchers to disambiguate both long- and short-term trends within languages on a country-by-country basis — several <https://urldefense.com/v3/__https://phabricator.wikimedia.org/T207171__;!!K-... > long-standing requests <https://urldefense.com/v3/__https://phabricator.wikimedia.org/T267283__;!!K-... > from Wikimedia communities.
Due to various technical factors, there are three distinct datasets:
1 July 2015 – 8 Feb 2017 <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datase... > / README <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datase... > (publishing threshold [1]: 3,500 pageviews)
9 Feb 2017 – 5 Feb 2023 <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datase... > / README <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datase... > (publishing threshold: 450 pageviews)
6 Feb 2023 – present <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datase... > / README <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datase... > (publishing threshold: 90 pageviews)
API access to this data should be coming in the next few months. In the interim, I’ve built an example python notebook <https://urldefense.com/v3/__https://public-paws.wmcloud.org/67457802/private... > illustrating how one might access the data in its current csv format, as well as several different kinds of simple analyses that can be done with it.
I also want to invite the research community to join me for a brief demo of this project at the July Research Showcase <https://urldefense.com/v3/__https://www.mediawiki.org/wiki/Wikimedia_Researc... >. In the meantime, please feel free to reach out with any questions on the project talk page <https://urldefense.com/v3/__https://meta.wikimedia.org/wiki/Talk:Differentia... >.
For more information about WMF’s work on differential privacy more generally, see the differential privacy homepage on meta <https://urldefense.com/v3/__https://meta.wikimedia.org/wiki/Differential_pri... >. And in the future, look for more announcements of privatized datasets on editor behavior, on-wiki search, centralnotice impressions and clicks, and more.
Best,
Hal
[1] “Publishing threshold” is the minimum value of a row in the dataset in order to be published. _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org