On 29 May 2010 01:48, Brian J Mingus <Brian.Mingus@colorado.edu> wrote:
On Fri, May 28, 2010 at 6:51 PM, Brian <Brian.Mingus@colorado.edu> wrote:
On Fri, May 28, 2010 at 5:31 PM, Liam Wyatt <liamwyatt@gmail.com> wrote:
Does anyone know of any research which demonstrates a correlation between the quality of an article in Wikipedia and the number of pageviews it receives?
I'm trying to argue, as part of my work with museums, that if they want to get people to come to their own website from wikipedia, then instead of focusing on adding top-level links to the "external links" section, they should instead focus on adding deep-linked footnotes and encouraging the improvement of the quality of the article in general. This is on the basis that "increased quality=increased pageviews=increased clickthroughs". A win-win situation.
I have some very nice statistics of pageviews (from stats.grok.se) that match the clickthrough stats from a museum (from their analytics) that clearly demonstrate a correlation between increased pageviews and increased clickthroughs. However, I'm still looking for some scientific data to prove the link between improved article quality and an increase in the number of views of that article.
Can anyone help?
-LiamAn easy way to pull this off looks like this:- Create a list of all the featured articles- Use http://stats.grok.se to count their pageviews- Generate a random list of articles- Check their pageviews- Compute correlation between pageviews / qualityYou can also do a slightly more detailed analysis by using the Wikipedia Editorial Team's ratings, but caveat emptor, their rating system is not satistifactory as I've shown in a couple of papers. Nonetheless, if you group together Featured/A/B articles and Start/Stub articles into two groups, the result should be solid.Liam,I've written you a quick bot that computes the correlation coefficient between quality and pageviews for random samples of featured and random articles in April. It requires Python and SciPy. You can find it here:I suggest that you cross validate the result. In other words, run the script several times and then average the results. It's rather slow, owing to the slowness of Wikimedia properties. I personally didn't run it for more than 10 featured and 10 random articles, but I set those numbers to 2000 for you. It will take hours to run.Cheers,Brian