I’m glad to announce the release of an open-licensed corpus with 1.5M records from the
Article Feedback v5 pilot.
http://dx.doi.org/10.6084/m9.figshare.1277784
Thanks to everyone who helped make this happen, Fabrice in particular for shepherding this
through.
Dario
—
This dataset contains the entire corpus of feedback submitted on the English, French and
German Wikipedia during the Article Feedback v.5 pilot (AFT). [1] The Wikimedia Foundation
ran the Article Feedback pilot for a year between March 2013 and March 2014. During the
pilot, 1,549,842 feedback messages were collected across the three languages.
All feedback messages and their metadata (as described in this schema [2]) are available
in this dataset, with the exception of messages that have been oversighted and/or deleted
by the end of the pilot.
The corpus is released [3] under the following license:
• CC BY SA 3.0 for feedback messages
• CC0 for the associated metadata
Results from the pilot are discussed in: Halfaker, A., Keyes, O. and Taraborelli, D
(2013). Making peripheral participation legitimate: Reader engagement experiments in
Wikipedia. CSCW ’13 Proceedings of the 2013 Conference on Computer Supported Cooperative
Work [4][5]
[1]
https://www.mediawiki.org/wiki/Article_feedback/Version_5
[2]
https://www.mediawiki.org/wiki/Article_feedback/Version_5/Technical_Design_…
[3]
https://wikimediafoundation.org/wiki/Feedback_data#Article_Feedback
[4]
http://dx.doi.org/10.1145/2441776.2441872
[5]
http://nitens.org/docs/cscw13.pdf