On Fri, Jun 4, 2010 at 8:16 AM, Brian <Brian.Mingus@colorado.edu> wrote:


On Thu, Jun 3, 2010 at 4:14 PM, Reid Priedhorsky <reid@umn.edu> wrote:
Brian J Mingus wrote:
> ---------- Forwarded message ----------
> From: Brian <Brian.Mingus@colorado.edu>
> Date: Wed, Jun 2, 2010 at 10:46 PM
> Subject: Re: [Wiki-research-l] Quality and pageviews
> To: Liam Wyatt <liamwyatt@gmail.com>
>
>
> Interestingly, the result is negative. The correlation coefficient between
> 2500 featured articles and 2500 random articles is .18 which is very low. I
> also trained a linear classifier to predict the quality of an article based
> on the number of page views and it was no better than chance.

That reminds me of an incidental finding from our 2007 work: we wanted
to use article edit rate to predict view rate, but there was no
correlation between the two.

Reid

That is an interesting negative finding as well. Just so this thread doesn't go without some positive results, here is a table from one of my technical reports on some features that do correlate with quality. If the number is greater than zero it correlates with quality, if it is 0 it does not correlate, and if it is less than 0 it is negatively correlated with quality. The scale of the numbers is meaningless and not interpretable, although the relative magnitude is important. These are just the relative performance of each feature for each class, as extracted from the weights of a random forests classifier.

http://grey.colorado.edu/mediawiki/sites/mingus/images/1/1e/DeHoustMangalathMingus08_feature_table.png

Summary (features in order of predictive ability):

 
Single best predictor overall: Automated Readability Index http://en.wikipedia.org/wiki/Automated_Readability_Index