Erik should be able to help you. I read your paper and your conclusions and you might think about rewriting them. In particular, correctness is not and cannot be evaluated by your method and therefore, cannot point readers to articles that are most likely correct, simply to articles that are wellwritten. Your measure of accuracy of your method is also a bit dubious, since the tags are not uniform (take two featured articles of different age and they will be of very different quality) and recovering them to 100% is therefore not a reasonable goal. However, I believe that you method is reasonable to find articles that are badly written.
Bye,
Philipp
2007/11/8, Brian Brian.Mingus@colorado.edu:
Several collaborators and I are preparing to expand on previous work to automatically ascertain the quality of Wikipedia articles on the English Wikipedia (presented at Wikimania '07 [0]). PageRank is Google's hallmark quality metric, and the foundation actually has access to these numbers through the Google Webmaster Tools website. If a foundation representative were to create a Google account and verify that they were a "webmaster," they could download the PageRank for every article on the English Wikipedia in a convenient tabular format. This data would likely serve as a fantastic predictor. I would also like to compare the Google-computed PageRank to the PageRank computed via Wikipedia's internal link structure. I don't see any privacy implications in releasing this data. It also doesn't seem to help spammers much, as they already know the pages that have a very high PageRank, and we include rel="nofollow" on outbound links. Nonetheless, I would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out there who has the power to make it happen?
Cheers :) Brian
[0] http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMing...
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l