Erik should be able to help you. I read your paper and your
conclusions and you might think about rewriting them. In particular,
correctness is not and cannot be evaluated by your method and
therefore, cannot point readers to articles that are most likely
correct, simply to articles that are wellwritten. Your measure of
accuracy of your method is also a bit dubious, since the tags are not
uniform (take two featured articles of different age and they will be
of very different quality) and recovering them to 100% is therefore
not a reasonable goal. However, I believe that you method is
reasonable to find articles that are badly written.
2007/11/8, Brian <Brian.Mingus(a)colorado.edu>du>:
Several collaborators and I are preparing to expand on
previous work to
automatically ascertain the quality of Wikipedia articles on the English
Wikipedia (presented at Wikimania '07 ). PageRank is Google's hallmark
quality metric, and the foundation actually has access to these numbers
through the Google Webmaster Tools website. If a foundation representative
were to create a Google account and verify that they were a "webmaster,"
they could download the PageRank for every article on the English Wikipedia
in a convenient tabular format. This data would likely serve as a fantastic
predictor. I would also like to compare the Google-computed PageRank to the
PageRank computed via Wikipedia's internal link structure. I don't see any
privacy implications in releasing this data. It also doesn't seem to help
spammers much, as they already know the pages that have a very high
PageRank, and we include rel="nofollow" on outbound links. Nonetheless, I
would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out there
who has the power to make it happen?
Wikiquality-l mailing list