Looks like the link to Morten's paper was broken in the last email. Here's
the full cite:
Morten Warncke-Wang, Dan Cosley, and John Riedl. 2013. Tell me more: an
actionable quality model for Wikipedia. In Proceedings of the 9th
International Symposium on Open Collaboration (WikiSym '13). ACM, New York,
NY, USA, , Article 8 , 10 pages. DOI=
http://dx.doi.org/10.1145/2491055.2491063, PDF=
http://www-users.cs.umn.edu/~morten/publications/wikisym2013-tellmemore.pdf
On Wed, Oct 12, 2016 at 3:13 AM, Aaron Halfaker <aaron.halfaker(a)gmail.com>
wrote:
Hey folks,
I just finished working with Amir[1,2] and building off of some of
Morten's work[3] to put together something that I think you're going to
like.
Halfaker, Aaron (2016): Monthly Wikipedia article quality predictions.
figshare.
https://dx.doi.org/10.6084/m9.figshare.3859800
Retrieved: 00 56, Oct 12, 2016 (GMT)
This dataset contains a row for every article-month since 20010101. Each
row has an article quality prediction based on text-only machine classifier
(from [3] with slight improvement) and hosted by ORES[4]. We've managed to
build models for English, French, and Russian Wikipedia, so I've generated
datasets for each of those wikis. It's current as of 2016-08-01 and I plan
to run updates periodically.
Here are the columns:
- page_id -- The page identifier
- page_title -- The title of the article (UTF-8_with_underscores)
- rev_id -- The most recent revision ID at the time of assessment
- timestamp -- The timestamp when the assessment was taken
(YYYYMMDDHHMMSS)
- prediction -- The predicted quality class ("Stub", "Start",
"C",
"B", "GA", "FA", ...)
- weighted_sum -- The sum of prediction weights assuming indexed class
ordering ("Stub" = 0, "Start" = 1, ...)
I'll update the docs based on your questions :)
1.
https://phabricator.wikimedia.org/p/Ladsgroup/
2.
https://github.com/Ladsgroup
3.
http://www-users.cs.umn.edu/~morten/publications/wikisym2
013-tellmemore.pdf
4.
https://ores.wikimedia.org/
-Aaron