On Fri, May 28, 2010 at 6:51 PM, Brian <Brian.Mingus(a)colorado.edu> wrote:
On Fri, May 28, 2010 at 5:31 PM, Liam Wyatt <liamwyatt(a)gmail.com> wrote:
Does anyone know of any research which
demonstrates a correlation between
the quality of an article in Wikipedia and the number of pageviews it
receives?
I'm trying to argue, as part of my work with museums, that if they want
to get people to come to their own website from wikipedia, then instead of
focusing on adding top-level links to the "external links" section, they
should instead focus on adding deep-linked footnotes and encouraging the
improvement of the quality of the article in general. This is on the basis
that "increased quality=increased pageviews=increased clickthroughs". A
win-win situation.
I have some very nice statistics of pageviews (from stats.grok.se) that
match the clickthrough stats from a museum (from their analytics) that
clearly demonstrate a correlation between increased pageviews and increased
clickthroughs. However, I'm still looking for some scientific data to prove
the link between improved article quality and an increase in the number of
views of that article.
Can anyone help?
-Liam
An easy way to pull this off looks like this:
- Create a list of all the featured articles
- Use
http://stats.grok.se to count their pageviews
- Generate a random list of articles
- Check their pageviews
- Compute correlation between pageviews / quality
You can also do a slightly more detailed analysis by using the Wikipedia
Editorial Team's ratings, but caveat emptor, their rating system is not
satistifactory as I've shown in a couple of papers. Nonetheless, if you
group together Featured/A/B articles and Start/Stub articles into two
groups, the result should be solid.
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/6/68/DeHoustM…
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
Report.
PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/d/d3/Rassbach…
Liam,
I've written you a quick bot that computes the correlation coefficient
between quality and pageviews for random samples of featured and random
articles in April. It requires Python and SciPy. You can find it here:
I suggest that you cross validate the result. In other words, run the
script several times and then average the results. It's rather slow, owing
to the slowness of Wikimedia properties. I personally didn't run it for more
than 10 featured and 10 random articles, but I set those numbers to 2000 for
you. It will take hours to run.
Cheers,
Brian