Hi all,
Thanks to those of you who responded to the data release survey we released
in October. The WMF Security team has developed a prioritization plan for
releasing data in the coming year:
.
We invite you to leave questions or comments on the talk page.
Warm regards,
Emily Lescak, WMF Research team
Hal Triedman, WMF Security team
On Wed, Oct 26, 2022 at 2:50 PM Emily Lescak <elescak(a)wikimedia.org> wrote:
Hi all,
As part of our efforts to better serve the Wikimedia research community,
we are happy to share that we are collaborating with the Security team at
WMF to help prioritize the release of data that can be useful for your
research. The Security team is working to make more datasets privatized
and public to avoid the need for non-disclosure agreements. You can learn
more here:
https://meta.wikimedia.org/wiki/Differential_privacy.
Over the next 12 months, the Security team plans to release 5 datasets:
-
country-language-pageview ongoing (end of 2022)
-
country-language-pageview historical (March 2023)
-
geo-aggregated grants data back to 2009 (Feb 2023)
-
geoeditors monthly (June 2023)
-
dataset informed by research community priorities identified in this
survey (second half of 2023)
The released datasets need to meet certain privacy requirements:
-
They can not include any natural language (e.g. specific search
queries or deletion logs) so as to avoid the release of personally
identifiable information;
-
They need to be sufficiently large (at least thousands of entries,
preferably more) so as to reduce noise;
-
The data can not be so sensitive that an individual user will be
harmed by disclosure of the data (e.g. IP addresses, content containing
personally identifying information).
We invite you to complete a brief survey
<https://docs.google.com/forms/d/e/1FAIpQLSe_LAt6V2Q1GUf3Z8lnt8uAOZnHTO5rNgFfufx_gDKk1znrlw/viewform?usp=sf_link>
to help us identify and prioritize the types of datasets that you would
find useful for your work. Results of this survey will inform the fifth
dataset, scheduled to be released in late 2023. This survey is conducted
via a third-party service, which may subject it to additional terms. For
more information on privacy and data-handling, see the survey privacy
statement:
https://foundation.wikimedia.org/wiki/Legal:Data_Release_Priorities_Survey_…
The survey will remain open until November 3, 2022. After that time,
members of the Research and Security teams will review the data and report
out about the suggestions that were received and how the work will proceed.
If you prefer to not respond via the Google form, you can email your
feedback to us or set up a time to discuss. You can also leave questions
and comments on the Talk page:
https://meta.wikimedia.org/wiki/Differential_privacy
Thanks for your help!
Emily Lescak, WMF Research team
Hal Triedman, WMF Security team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation