I’m glad to announce the release of an open-licensed corpus with 1.5M records from the Article Feedback v5 pilot.
http://dx.doi.org/10.6084/m9.figshare.1277784
Thanks to everyone who helped make this happen, Fabrice in particular for shepherding this through.
Dario
— This dataset contains the entire corpus of feedback submitted on the English, French and German Wikipedia during the Article Feedback v.5 pilot (AFT). [1] The Wikimedia Foundation ran the Article Feedback pilot for a year between March 2013 and March 2014. During the pilot, 1,549,842 feedback messages were collected across the three languages.
All feedback messages and their metadata (as described in this schema [2]) are available in this dataset, with the exception of messages that have been oversighted and/or deleted by the end of the pilot. The corpus is released [3] under the following license:
• CC BY SA 3.0 for feedback messages • CC0 for the associated metadata
Results from the pilot are discussed in: Halfaker, A., Keyes, O. and Taraborelli, D (2013). Making peripheral participation legitimate: Reader engagement experiments in Wikipedia. CSCW ’13 Proceedings of the 2013 Conference on Computer Supported Cooperative Work [4][5]
[1] https://www.mediawiki.org/wiki/Article_feedback/Version_5 [2] https://www.mediawiki.org/wiki/Article_feedback/Version_5/Technical_Design_S... [3] https://wikimediafoundation.org/wiki/Feedback_data#Article_Feedback [4] http://dx.doi.org/10.1145/2441776.2441872 [5] http://nitens.org/docs/cscw13.pdf