Interesting reading!
I believe that a correct evaluation of article quality must be combined
with writers reputation and most likely also how the writer interacts
with other users. The article itself also don't exist in a vacuum, as
you suggest in the final notes about the PageRank algorithm. Incoming
links are very useful for evaluating the article quality, but it takes
time for them to emerge. It is therefore highly likely that it will be
necessary to use different approaches to asses the article quality, not
only given the category for the article but also given the age of the
article.
A lot of those measures will interact. For example person A writes the
article but has previously written articles that don't rate as very good
due to factual errors. He does although write good English (most likely,
thats not me.. ;) Now person B writes rotten English (oh, thats me!) but
writes factual correct articles. Because of his very bad English other
contributors reverts his edits or rewrites them. Both of these two
persons will rank very badly, and their articles even worse. Still, when
they team up they can produce excellent articles.
When I first started to look into estimating writers reputation and
article quality I expect to find some fairly obvious features to use.
What I did find was that there was several connected systems, and that
all of them (at least the most prominent ones) should be taken into
account. Still there will be a fairly large number of erroneous
classifications.
John E
Brian skrev:
Several collaborators and I are preparing to expand on
previous work
to automatically ascertain the quality of Wikipedia articles on the
English Wikipedia (presented at Wikimania '07 [0]). PageRank is
Google's hallmark quality metric, and the foundation actually has
access to these numbers through the Google Webmaster Tools website. If
a foundation representative were to create a Google account and verify
that they were a "webmaster," they could download the PageRank for
every article on the English Wikipedia in a convenient tabular format.
This data would likely serve as a fantastic predictor. I would also
like to compare the Google-computed PageRank to the PageRank computed
via Wikipedia's internal link structure. I don't see any privacy
implications in releasing this data. It also doesn't seem to help
spammers much, as they already know the pages that have a very high
PageRank, and we include rel="nofollow" on outbound links.
Nonetheless, I would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out
there who has the power to make it happen?
Cheers :)
Brian
[0]
http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMin…
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
Wikiquality-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l