Hi Shiyue,
The issues around assessments that have been brought up are valid and
useful to keep in mind when trying to build machine learners that do
quality predictions. That being said, ORES quality classifier[1] is (AFAIK)
trained on a dataset[2] that I've gathered based on the method I used to
get a dataset to train the classifier used in our CSCW 2015 paper[3]. The
revisions that are in that dataset were gathered by taking a snapshot of
the quality assessment classes and then walking backwards through the talk
page revision history to find the time when the assessment changed, and
then grabbing the revision of the article at that timestamp. If you want
Python code instead of the dataset, let me know.
The team behind ORES has also been working on writing scripts that'll do
assessment extractions (see for instance [4]), in case you want to process
a dump and get all of them. So far our experience with that is that it
leads to slightly lower performance. Although we're uncertain as to why, my
guess is that the dataset is noisier, perhaps due to changing quality
criteria as Andrew points to.
Please do get in touch if you have any questions!
References:
1:
Cheers,
Morten
On 10 June 2016 at 00:59, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:
Hi Shiyue,
I agree with Kelly - these ratings probably won't do what you need, in
that case. Sorry!
We simply don't have the people (or the enthusiasm) required to do
regular updates and I'd guess many are well over five years 'stale' since
last rating - and most will only ever have been rated once.
There's a second complicating factor for old ratings - not only are
they stale, but the general standards for that rating might have changed.
(See eg
http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-fea…
for a demonstration of that last point - it would be interesting to use
ORES to do a bigger sample)
Andrew.
On 10 Jun 2016 07:13, "Shiyue Zhang" <byryuer(a)gmail.com> wrote:
> Hi Kerry,
>
> Thanks a lot for your reply! Honestly, I am not aware of the problem
> you mentioned that many wikiprojects don't do regular quality assessment.
> This problem really matters to me, because I want to get the relatively
> true quality of a revision of an article. I know Aaron's automated quality
> assessment tool, but it is also based on a machine learning classifier,
> which is also my goal to automatically predict quality, especially quality
> change. So I can't take the results of this tool as my ground truth.
>
> 2016-06-10 12:16 GMT+08:00 Kerry Raymond <kerry.raymond(a)gmail.com>om>:
>
>> If you are not aware of it, many wikiprojects don’t do any kind of
>> regular quality assessment. Often an article is project-tagged and assessed
>> when it’s new (which generally means the quality is assessed stub/start/C)
>> and then it’s never re-assessed unless someone working on it is trying to
>> get it to GA or similar and hence actively requests assessment.
>>
>>
>>
>> So it’s easy for an article to be much better quality (or even much
>> worse quality, although that’s probably less likely) than its current
>> assessment.
>>
>>
>>
>> I think you might do better to use Aaron’s automated quality
>> assessment tool and apply it to different versions of a set of article and
>> see how that changes over time. Whatever the deficiencies of an automated
>> tool, I suspect it’s still more reliable than the human processes that we
>> actually have. But I guess it depends on whether the focus of your study is
>> the quality of articles or is it the process of assessing the quality of
>> articles? My sense is that you are interested in the former rather than the
>> latter.
>>
>>
>>
>> Kerry
>>
>>
>>
>> *From:* Wiki-research-l [mailto:
>> wiki-research-l-bounces(a)lists.wikimedia.org] *On Behalf Of *Shiyue
>> Zhang
>> *Sent:* Friday, 10 June 2016 12:42 PM
>> *To:* Research into Wikimedia content and communities <
>> wiki-research-l(a)lists.wikimedia.org>
>> *Subject:* Re: [Wiki-research-l] How to get the exact date when an
>> article get a quality promotion?
>>
>>
>>
>> Hi Pine,
>>
>>
>>
>> Thanks for your reply. Yes, it is English Wikipedia. Exactly I want
>> to get the timestamp of an article's quality rating change. I know
>> the particular diffs shouldn't be considered as the reason why quality
>> rating change. I'm trying to get a prediction of quality change beyond a
>> certain time period, so I need the start and end quality of the time
>> period.
>>
>>
>>
>> I hope anyone have the experience on this problem can give me some
>> advice. Thanks a lot!!!
>>
>>
>>
>> 2016-06-10 9:47 GMT+08:00 Pine W <wiki.pine(a)gmail.com>om>:
>>
>> Hi Zhang,
>>
>> Is this for English Wikipedia?
>>
>> You can probably use automation to find the timestamp of an article's
>> quality rating change on English Wikipedia. Other people on this list
>> probably know how to do this, and they may comment here.
>>
>> However, that does not imply that any paricular diffs should be
>> considered to have a quality that is equivalent to the quality of the
>> article. Measuring the quality of diffs is an inexact science, but you
>> might want to take a look at Revision Scoring. Aaron Halfaker can tell you
>> more about how useful, or not, Revision Scoring is for measuring the
>> quality of diffs. Hopefully he will respond to this email.
>>
>> Pine
>>
>> On Jun 9, 2016 18:29, "Shiyue Zhang" <byryuer(a)gmail.com> wrote:
>>
>> Hi,
>>
>>
>>
>> I'm doing research on Wikipedia article quality, and I take advantage
>> of WikiProject Assessments. But I can only get the latest quality level of
>> an article. I wonder how to get the quality of each revision, or how to
>> get the exact date when an article get a quality promotion, for example,
>> from A-class to FA-class.
>>
>>
>>
>> I really need your help! Thanks!
>>
>>
>>
>> Zhang Shiyue
>>
>>
>>
>> --
>>
>> Zhang Shiyue
>>
>> *Tel*: +86 18801167900
>>
>> *E-mail*: byryuer(a)gmail.com, yuer3677(a)163.com
>>
>> State Key Laboratory of Networking and Switching Technology
>>
>> No.10 Xitucheng Road, Haidian District
>>
>> Beijing University of Posts and Telecommunications
>>
>> Beijing, China.
>>
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>>
>>
>> --
>>
>> Zhang Shiyue
>>
>> *Tel*: +86 18801167900
>>
>> *E-mail*: byryuer(a)gmail.com, yuer3677(a)163.com
>>
>> State Key Laboratory of Networking and Switching Technology
>>
>> No.10 Xitucheng Road, Haidian District
>>
>> Beijing University of Posts and Telecommunications
>>
>> Beijing, China.
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
>
> --
>
> Zhang Shiyue
>
> *Tel*: +86 18801167900
>
> *E-mail*: byryuer(a)gmail.com, yuer3677(a)163.com
>
> State Key Laboratory of Networking and Switching Technology
>
> No.10 Xitucheng Road, Haidian District
>
> Beijing University of Posts and Telecommunications
>
> Beijing, China.
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org