Hey folks,
I just finished working with Amir[1,2] and building off of some of Morten's work[3] to put together something that I think you're going to like.
Halfaker, Aaron (2016): Monthly Wikipedia article quality predictions.
figshare. https://dx.doi.org/10.6084/m9.figshare.3859800 Retrieved: 00 56, Oct 12, 2016 (GMT)
This dataset contains a row for every article-month since 20010101. Each row has an article quality prediction based on text-only machine classifier (from [3] with slight improvement) and hosted by ORES[4]. We've managed to build models for English, French, and Russian Wikipedia, so I've generated datasets for each of those wikis. It's current as of 2016-08-01 and I plan to run updates periodically.
Here are the columns:
- page_id -- The page identifier - page_title -- The title of the article (UTF-8_with_underscores) - rev_id -- The most recent revision ID at the time of assessment - timestamp -- The timestamp when the assessment was taken (YYYYMMDDHHMMSS) - prediction -- The predicted quality class ("Stub", "Start", "C", "B", "GA", "FA", ...) - weighted_sum -- The sum of prediction weights assuming indexed class ordering ("Stub" = 0, "Start" = 1, ...)
I'll update the docs based on your questions :)
1. https://phabricator.wikimedia.org/p/Ladsgroup/ 2. https://github.com/Ladsgroup 3. http://www-users.cs.umn.edu/~morten/publications/ wikisym2013-tellmemore.pdf 4. https://ores.wikimedia.org/
-Aaron
Looks like the link to Morten's paper was broken in the last email. Here's the full cite:
Morten Warncke-Wang, Dan Cosley, and John Riedl. 2013. Tell me more: an actionable quality model for Wikipedia. In Proceedings of the 9th International Symposium on Open Collaboration (WikiSym '13). ACM, New York, NY, USA, , Article 8 , 10 pages. DOI= http://dx.doi.org/10.1145/2491055.2491063, PDF= http://www-users.cs.umn.edu/~morten/publications/wikisym2013-tellmemore.pdf
On Wed, Oct 12, 2016 at 3:13 AM, Aaron Halfaker aaron.halfaker@gmail.com wrote:
Hey folks,
I just finished working with Amir[1,2] and building off of some of Morten's work[3] to put together something that I think you're going to like.
Halfaker, Aaron (2016): Monthly Wikipedia article quality predictions.
figshare. https://dx.doi.org/10.6084/m9.figshare.3859800 Retrieved: 00 56, Oct 12, 2016 (GMT)
This dataset contains a row for every article-month since 20010101. Each row has an article quality prediction based on text-only machine classifier (from [3] with slight improvement) and hosted by ORES[4]. We've managed to build models for English, French, and Russian Wikipedia, so I've generated datasets for each of those wikis. It's current as of 2016-08-01 and I plan to run updates periodically.
Here are the columns:
- page_id -- The page identifier
- page_title -- The title of the article (UTF-8_with_underscores)
- rev_id -- The most recent revision ID at the time of assessment
- timestamp -- The timestamp when the assessment was taken
(YYYYMMDDHHMMSS)
- prediction -- The predicted quality class ("Stub", "Start", "C",
"B", "GA", "FA", ...)
- weighted_sum -- The sum of prediction weights assuming indexed class
ordering ("Stub" = 0, "Start" = 1, ...)
I'll update the docs based on your questions :)
- https://phabricator.wikimedia.org/p/Ladsgroup/
- https://github.com/Ladsgroup
- http://www-users.cs.umn.edu/~morten/publications/wikisym2
013-tellmemore.pdf 4. https://ores.wikimedia.org/
-Aaron
wiki-research-l@lists.wikimedia.org