Hi Shiyue,

The issues around assessments that have been brought up are valid and useful to keep in mind when trying to build machine learners that do quality predictions. That being said, ORES quality classifier[1] is (AFAIK) trained on a dataset[2] that I've gathered based on the method I used to get a dataset to train the classifier used in our CSCW 2015 paper[3]. The revisions that are in that dataset were gathered by taking a snapshot of the quality assessment classes and then walking backwards through the talk page revision history to find the time when the assessment changed, and then grabbing the revision of the article at that timestamp. If you want Python code instead of the dataset, let me know.

The team behind ORES has also been working on writing scripts that'll do assessment extractions (see for instance [4]), in case you want to process a dump and get all of them. So far our experience with that is that it leads to slightly lower performance. Although we're uncertain as to why, my guess is that the dataset is noisier, perhaps due to changing quality criteria as Andrew points to.

Please do get in touch if you have any questions!

References:
1: https://meta.wikimedia.org/wiki/ORES/wp10
2: https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406
3: http://www-users.cs.umn.edu/~morten/publications/cscw2015-improvementprojects.pdf, see Appendix A for info on the classifier
4: https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/extractors/enwiki.py

Cheers,
Morten


On 10 June 2016 at 00:59, Andrew Gray <andrew.gray@dunelm.org.uk> wrote:

Hi Shiyue,

I agree with Kelly - these ratings probably won't do what you need, in that case. Sorry!

We simply don't have the people (or the enthusiasm) required to do regular updates and I'd guess many are well over five years 'stale' since last rating - and most will only ever have been rated once.

There's a second complicating factor for old ratings - not only are they stale, but the general standards for that rating might have changed. (See eg http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-featured-articles/ for a demonstration of that last point - it would be interesting to use ORES to do a bigger sample)

Andrew.

On 10 Jun 2016 07:13, "Shiyue Zhang" <byryuer@gmail.com> wrote:
Hi Kerry,

Thanks a lot for your reply! Honestly, I am not aware of the problem you mentioned that many wikiprojects don't do regular quality assessment. This problem really matters to me, because I want to get the relatively true quality of a revision of an article. I know Aaron's automated quality assessment tool, but it is also based on a machine learning classifier, which is also my goal to automatically predict quality, especially quality change. So I can't take the results of this tool as my ground truth. 

2016-06-10 12:16 GMT+08:00 Kerry Raymond <kerry.raymond@gmail.com>:

If you are not aware of it, many wikiprojects don’t do any kind of regular quality assessment. Often an article is project-tagged and assessed when it’s new (which generally means the quality is assessed stub/start/C) and then it’s never re-assessed unless someone working on it is trying to get it to GA or similar and hence actively requests assessment.

 

So it’s easy for an article to be much better quality (or even much worse quality, although that’s probably less likely) than its current assessment.

 

I think you might do better to use Aaron’s automated quality assessment tool and apply it to different versions of a set of article and see how that changes over time. Whatever the deficiencies of an automated tool, I suspect it’s still more reliable than the human processes that we actually have. But I guess it depends on whether the focus of your study is the quality of articles or is it the process of assessing the quality of articles? My sense is that you are interested in the former rather than the latter.

 

Kerry

 

From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Shiyue Zhang
Sent: Friday, 10 June 2016 12:42 PM
To: Research into Wikimedia content and communities <wiki-research-l@lists.wikimedia.org>
Subject: Re: [Wiki-research-l] How to get the exact date when an article get a quality promotion?

 

Hi Pine, 

 

Thanks for your reply. Yes, it is English Wikipedia. Exactly I want to get the timestamp of an article's quality rating change. I know the particular diffs shouldn't be considered as the reason why quality rating change. I'm trying to get a prediction of quality change beyond a certain time period, so I need the start and end quality of the time period. 

 

I hope anyone have the experience on this problem can give me some advice. Thanks a lot!!!

 

2016-06-10 9:47 GMT+08:00 Pine W <wiki.pine@gmail.com>:

Hi Zhang,

Is this for English Wikipedia?

You can probably use automation to find the timestamp of an article's quality rating change on English Wikipedia. Other people on this list probably know how to do this, and they may comment here.

However, that does not imply that any paricular diffs should be considered to have a quality that is equivalent to the quality of the article. Measuring the quality of diffs is an inexact science, but you might want to take a look at Revision Scoring. Aaron Halfaker can tell you more about how useful, or not, Revision Scoring is for measuring the quality of diffs. Hopefully he will respond to this email.

Pine

On Jun 9, 2016 18:29, "Shiyue Zhang" <byryuer@gmail.com> wrote:

Hi, 

 

I'm doing research on Wikipedia article quality, and I take advantage of WikiProject Assessments. But I can only get the latest quality level of an article. I wonder how to  get the quality of each revision, or how to get the exact date when an article get a quality promotion, for example, from A-class to FA-class.

 

I really need your help! Thanks!

 

Zhang Shiyue

 

--

Zhang Shiyue

Tel: +86 18801167900

E-mail: byryuer@gmail.com, yuer3677@163.com

State Key Laboratory of Networking and Switching Technology

No.10 Xitucheng Road, Haidian District

Beijing University of Posts and Telecommunications

Beijing, China.

 

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 

--

Zhang Shiyue

Tel: +86 18801167900

E-mail: byryuer@gmail.com, yuer3677@163.com

State Key Laboratory of Networking and Switching Technology

No.10 Xitucheng Road, Haidian District

Beijing University of Posts and Telecommunications

Beijing, China.


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--

Zhang Shiyue

Tel: +86 18801167900

E-mail: byryuer@gmail.com, yuer3677@163.com

State Key Laboratory of Networking and Switching Technology

No.10 Xitucheng Road, Haidian District

Beijing University of Posts and Telecommunications

Beijing, China.


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l