Hey folks,I just finished working with Amir[1,2] and building off of some of Morten's work[3] to put together something that I think you're going to like.Halfaker, Aaron (2016): Monthly Wikipedia article quality predictions. figshare.
https://dx.doi.org/10.6084/m9.figshare.3859800
Retrieved: 00 56, Oct 12, 2016 (GMT)This dataset contains a row for every article-month since 20010101. Each row has an article quality prediction based on text-only machine classifier (from [3] with slight improvement) and hosted by ORES[4]. We've managed to build models for English, French, and Russian Wikipedia, so I've generated datasets for each of those wikis. It's current as of 2016-08-01 and I plan to run updates periodically.Here are the columns:
- page_id -- The page identifier
- page_title -- The title of the article (UTF-8_with_underscores)
- rev_id -- The most recent revision ID at the time of assessment
- timestamp -- The timestamp when the assessment was taken (YYYYMMDDHHMMSS)
- prediction -- The predicted quality class ("Stub", "Start", "C", "B", "GA", "FA", ...)
- weighted_sum -- The sum of prediction weights assuming indexed class ordering ("Stub" = 0, "Start" = 1, ...)
I'll update the docs based on your questions :)-Aaron