Hello!
I don't think I have the authority to grant this approval as stated. We can't make this data public, but if you find a research sponsor at the Wikimedia Foundation, we can grant you access to private data internally if you sign an NDA.
Please see https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations and https://meta.wikimedia.org/wiki/Research:FAQ#collaborations for more information on how to contact the WMF Research team and propose a formal collaboration.
I am also CCing the wiki-research-l mailing list.
Good luck! -Andrew Otto
On Mon, Sep 26, 2022 at 3:27 AM Thiago Freitas thiago@iiia.csic.es wrote:
Dear Andrew Otto, How are you?
I'm Thiago Freitas, a PhD student from the Artificial Intelligence Research Institute (IIIA-CSIC) in Spain. We are working on detecting hate speech in online communities and we are interested in the Wikipedia use case, specifically the Revision Deletion data, which is not publicly available.
I am writing this email to "coordinate obtaining a comment of approval on this task from the approving party" as described in the required task in Phabricator. We have already published work on detecting norm violations on Wikipedia (https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p427.pdf and bit.ly/3t08QCg) with data available online, now we are looking to further improve the quality of our research on hate speech detection with this additional dataset. We will use this data to build different machine learning models with experiments in several approaches. I would be glad to provide further information about our work. Thank you so much for your attention.
Best regards, Thiago Freitas
Hi Andrew, Thanks for adding wiki-research-l.
Hi Thiago,
Sorry for the delay in getting back to you. I will have to admit that there may be ways you can access this data that I'm not aware of. One way that I know is for WMF to give you access to https://en.wikipedia.org/wiki/Special:GlobalGroupPermissions/wmf-researcher . For that, we need a Formal Collaboration to be set up, as Andrew shared earlier. The Research team's [1] capacity is currently unfortunately limited with multiple existing priorities that we need to remain focused on. As a result we cannot add a Formal Collaboration to support you.
While unfortunately we won't be able to support you now, I hope that through efforts such as investments in differential privacy [2] we can support you and other researchers with access to some of the data that is currently private.
I'm sorry that I can't support you at this time and I hope you understand.
Best, Leila
[1] https://research.wikimedia.org/team.html [2] https://meta.wikimedia.org/wiki/Differential_privacy -- Leila Zia Head of Research Wikimedia Foundation
On Mon, Oct 3, 2022 at 8:48 AM Andrew Otto otto@wikimedia.org wrote:
Hello!
I don't think I have the authority to grant this approval as stated. We can't make this data public, but if you find a research sponsor at the Wikimedia Foundation, we can grant you access to private data internally if you sign an NDA.
Please see https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations and https://meta.wikimedia.org/wiki/Research:FAQ#collaborations for more information on how to contact the WMF Research team and propose a formal collaboration.
I am also CCing the wiki-research-l mailing list.
Good luck! -Andrew Otto
On Mon, Sep 26, 2022 at 3:27 AM Thiago Freitas thiago@iiia.csic.es wrote:
Dear Andrew Otto, How are you?
I'm Thiago Freitas, a PhD student from the Artificial Intelligence Research Institute (IIIA-CSIC) in Spain. We are working on detecting hate speech in online communities and we are interested in the Wikipedia use case, specifically the Revision Deletion data, which is not publicly available.
I am writing this email to "coordinate obtaining a comment of approval on this task from the approving party" as described in the required task in Phabricator. We have already published work on detecting norm violations on Wikipedia (https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p427.pdf and bit.ly/3t08QCg) with data available online, now we are looking to further improve the quality of our research on hate speech detection with this additional dataset. We will use this data to build different machine learning models with experiments in several approaches. I would be glad to provide further information about our work. Thank you so much for your attention.
Best regards, Thiago Freitas
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org
wiki-research-l@lists.wikimedia.org