I am sure this has already been discussed, but just in case, here goes
my two cents:
The post in http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
explains why implicit metadata (like Google's PageRank) are better
than explicit metadata (Like Digg votes).
Making a comparison to Wikimedia, I'd say that Prof. Luca's trust
algorithm is a more reliable way to determine the quality of an
article's text than the Flagged Revision Extension.
However, the point of the latter is to provide a stable version to the
user who chooses that, while the former displays to which degree the
info can be trusted, but still showing the untrusted text.
What I'd like to suggest is the implementation of a filter based on
the trust calculations of Prof. Luca's algorithm, which would use the
editors' calculated reliability to automatically choose to display a
certain revision of an article. It could be implemented in 3 ways:
1. Show the last revision of an article made by an editor with a trust
score bigger than the value that the reader provided. The trusted
editor is implicitly setting a minimum quality flag in the article by
saving a revision without changing other parts of the text. This is
the simpler approach, but it doent prevent untrusted text to show up,
in case the trusted editor leaves untrusted parts of the text
unchanged.
2. Filter the full history. Basically, the idea is to show the parts
of the to the article written by users with a trust score bigger than
the value that the reader provided. This would work like slashdot's
comment filtering system, for example. Evidently, this is the most
complicated approach, since it would require an automated conflict
resolution system which might not be possible.
3. A mixed option could be to try to hide revisions by editors with a
lower trust value than the threshold set. This could be done as far
back in the article history as possible, while a content conflict
isn't found.
Instead of trust values, this could also work by setting the threshold
above unregistered users, or newbies (I think this is approximately
equivalent to accounts younger than 4 days)
Anyway, these are just rough ideas, on which I'd like to hear your thoughts.
Several collaborators and I are preparing to expand on previous work to
automatically ascertain the quality of Wikipedia articles on the English
Wikipedia (presented at Wikimania '07 [0]). PageRank is Google's hallmark
quality metric, and the foundation actually has access to these numbers
through the Google Webmaster Tools website. If a foundation representative
were to create a Google account and verify that they were a "webmaster,"
they could download the PageRank for every article on the English Wikipedia
in a convenient tabular format. This data would likely serve as a fantastic
predictor. I would also like to compare the Google-computed PageRank to the
PageRank computed via Wikipedia's internal link structure. I don't see any
privacy implications in releasing this data. It also doesn't seem to help
spammers much, as they already know the pages that have a very high
PageRank, and we include rel="nofollow" on outbound links. Nonetheless, I
would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out there
who has the power to make it happen?
Cheers :)
Brian
[0]
http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMin…
Dear All,
we have finally finished a technical report describing in detail the
algorithms we use for the Wikipedia trust coloring (see
http://trust.cse.ucsc.edu/). The technical paper not only gives the
algorithms, but describes several ways to quantify notions of text trust for
the Wikipedia, and provides detailed quantiative results on the quality of
our coloring.
The technical report is available here:
http://www.soe.ucsc.edu/~luca/papers/07/trust-techrep.html
We would of course appreciate comments and feedback. I wish a happy weekend
to you all,
Luca