As another note, quality can be typically associated with something of high value. The problem is that value is vague and a subjective concept. Determining the value of a particular object\page to a particular individual is impossible at best. However, there are numerous proxy features for value. One such feature is popularity and that's what the page rank algorithm gets at. Another such feature is the number of readers who navigate to a page -- this is sort of like popularity except that it also encompasses the many eyes principle (the more people see an article, the better it will become (i.e. it will become of higher quality)).
Of course this may be disputed, but if you think value is what you're really trying to get at, a potential direction is to go with page views (e.g. as do Priedhorsky et. al. in "Creating, Destroying, and Restoring Value in Wikipedia." -- http://tinyurl.com/269lpq ).
ivan.
On Nov 9, 2007, at 2:08 AM, John Erling Blad wrote:
Interesting reading!
I believe that a correct evaluation of article quality must be combined with writers reputation and most likely also how the writer interacts with other users. The article itself also don't exist in a vacuum, as you suggest in the final notes about the PageRank algorithm. Incoming links are very useful for evaluating the article quality, but it takes time for them to emerge. It is therefore highly likely that it will be necessary to use different approaches to asses the article quality, not only given the category for the article but also given the age of the article.
A lot of those measures will interact. For example person A writes the article but has previously written articles that don't rate as very good due to factual errors. He does although write good English (most likely, thats not me.. ;) Now person B writes rotten English (oh, thats me!) but writes factual correct articles. Because of his very bad English other contributors reverts his edits or rewrites them. Both of these two persons will rank very badly, and their articles even worse. Still, when they team up they can produce excellent articles.
When I first started to look into estimating writers reputation and article quality I expect to find some fairly obvious features to use. What I did find was that there was several connected systems, and that all of them (at least the most prominent ones) should be taken into account. Still there will be a fairly large number of erroneous classifications.
John E
Brian skrev:
Several collaborators and I are preparing to expand on previous work to automatically ascertain the quality of Wikipedia articles on the English Wikipedia (presented at Wikimania '07 [0]). PageRank is Google's hallmark quality metric, and the foundation actually has access to these numbers through the Google Webmaster Tools website. If a foundation representative were to create a Google account and verify that they were a "webmaster," they could download the PageRank for every article on the English Wikipedia in a convenient tabular format. This data would likely serve as a fantastic predictor. I would also like to compare the Google-computed PageRank to the PageRank computed via Wikipedia's internal link structure. I don't see any privacy implications in releasing this data. It also doesn't seem to help spammers much, as they already know the pages that have a very high PageRank, and we include rel="nofollow" on outbound links. Nonetheless, I would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out there who has the power to make it happen?
Cheers :) Brian
[0] http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/ RassbachPincockMingus07.pdf
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
<john.erling.blad.vcf>
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l