[Wikiquality-l] Implicit vs Explicit metadata

John Erling Blad john.erling.blad at jeb.no
Sun Dec 2 18:40:20 UTC 2007


An edit can be made by a very high rated editor and still be completely
in error. Very often this happens because the editor is biased on the
matter, or because he or she is to self confident due to excellence in
related fields of expertise. If a rating system could use multiple
reviewers it will probably rate contributions more correctly.

Contributions _should_ be rated by several persons, and they _should_ be
identified as diferent. This can be done by using the IP-adress and an
unique cookie identifier.

I guess the "top trust authors essentially vote for it by leaving it
there" means that authors are given thrust metrics and that they don't
have to do any manual interfering except reading the article to accept
it. In such a situation it is necessary to verify that they do in fact
read the article.

It is also possible to adjust the number off necessary known voters by
tracking all users, and counting each reader as a positive reader-vote.
If the number of readers are small the known voters must go up to
adjust. If the numbers of readers are very high they can be counted as
unknown voters with a very small correcting factor.

If a known voter goes against the contribution he should have a rather
high impact compared to pro-votes, and it should lock reader-votes. This
comes from the fact that contributors isn't very likely to air public
criticism. I'm not sure how this will be if the voting isn't public,
perhaps more users will vote against contributions but I doubt it.

John E

Luca de Alfaro skrev:
> In the trust algorithm I am implementing with my collaborators (mainly
> Ian Pye and Bo Adler), text can gain top trust only when top trust
> authors essentially vote for it by leaving it there, so there is not
> so much difference in the "voting" part wrt flagged revisions.  The
> main difference is that in our algorithm, the text that is unchanged
> inherits the trust from the text in the previous revision --- no need
> for re-voting.  The algorithm has been described in a techrep
> http://www.soe.ucsc.edu/~luca/papers/07/trust-techrep.html
> <http://www.soe.ucsc.edu/%7Eluca/papers/07/trust-techrep.html> .
>
> This week, we will open access to a demo of the whole Wikipedia, as of
> its February 6, 2007, snapshot, colored for trust.  We are now working
> towards making a "live" real-time version, so that revisions can be
> colored as soon as they are created.
> Currently, people can "vote" for the accuracy of text only by leaving
> it unchanged during an edit.  We plan, for the live implementation, to
> give an "I agree with it" button, which enables people to vote for the
> accuracy of text without the need for editing it.
>
> We have also considered selecting as "stable" revisions revisions that
> are both recent, and of high trust.  We agree with the original poster
> that this may have some advantages wrt flagged revisions:
>
>     * No need to explicitly spend time flagging (there are millions of
>       pages); editor time can be used for editing or contributing. 
>     * Flagging pages is time consuming, and re-flagging pages after
>       they change is even more so.  In our algorithm, there is no need
>       for re-flagging. If a high-trust page is subject to
>       modifications, and these modifications are approved or left in
>       place by high-reputation authors (and all editors are
>       high-reputation), the newer version is automatically selected,
>       without need for explicit re-flagging.
>     * As the trust algorithm is automatic, there won't be the problem
>       of flagged revisions becoming outdated with respect to the most
>       recent revision: if many authors (including some high-reputation
>       ones) agree with a recent revision, the  recent revision will
>       automatically become high trust, and thus selected. 
>     * Our algorithm actually requires the consensus, accumulated
>       through revisions, of more than one high-reputation author, to
>       label text as high trust.  A rogue high-reputation editor cannot
>       single-handedly create high-trust text.
>
> When the demo will be available later this week, you will be able to
> judge whether you would like to select recent revisions that are of
> high trust.   The algorithm we have now is not perfect, and we are
> considering improvements (we will update the demo when we have such
> improvements), but in general we share the opinion of the original
> poster: an automatic method for flagging revisions can be more
> accurate (preventing flagged revisions from becoming out of date), and
> less painful (no need to spend time flagging).
>
> Luca
>
> On Dec 2, 2007 7:32 AM, Waldir Pimenta <waldir at email.com
> <mailto:waldir at email.com>> wrote:
>
>     I agree with you, Aaron. Flagged Revisions by *trusted users* is
>     indeed better than automatic trustworthiness evaluation. Ratings
>     by everyone, probably wouldn't be. But I'd say, following your own
>     words ("It has the advantage of leading to a burst of pages with
>     "trusted" versions without adding any real workload whatsoever"),
>     why not having this the default option if there is no flagged
>     version available (yet)? Perhaps with a note, shown to people who
>     choose to view stable versions (or to all unlogged readers, if the
>     stable versions are to be default), similar to the one that we see
>     when we're consulting an old revision. It seems to me that this is
>     better than showing the current version if the flagged one doesn't
>     exist, or is too far away in the revision history. (with the
>     "stable view" enabled, that is). Also, picking a revision with no
>     "highly dubious" parts sounds a good approach to me :)
>
>     Waldir
>
>
>     On Nov 28, 2007 12:40 AM, Aaron Schulz <jschulz_4587 at msn.com
>     <mailto:jschulz_4587 at msn.com>> wrote:
>
>
>         Flagged Revisions and Article Trust are really apples and
>         oranges. I have contacted them, and let them know I'd be
>         interested in getting this up into a stable extension; they
>         are not in competetion.
>
>         Anyway, my problem with that article about implicit vs.
>         explicit metadata is that a)it assumes any random user can
>         rate, b)you are measuring simple things like
>         interesting/cool/worth reading, and c) you don't care too much
>         if bad content shows sometimes. The problem is that none of
>         these hold true here. Flagged Revisions uses
>         Editors/Reviewers, it mainly checks accuracy, and we don't
>         want high profile pages/living people articles/highly
>         vandalized pages as well as eventually anything to show up
>         with vandalism. Going to "George Bush" and seeing a vulva for
>         the infobox is not ever acceptable (I don't even know if
>         Article Trust rates images), even if the vandalism is darker
>         orange or whatever.
>
>         The Article Trust code looks at the page authors. To a large
>         extent, this quite good at highlighting the more dubious
>         stuff. On the other hand, things become less orange with new
>         edits (since it is less likely to be crap). The downside is
>         that cruft and garbage can get less orange and appear more
>         valid. This can easily happen with large articles and section
>         editing. That makes this it very hard to use for quality
>         versions. Flagged Revisions would be better at that.
>
>         Vandalism can take days to clean up. If AT is to be selecting
>         the best revision, it should trying to check both global
>         average trust of each revision as well as it's worst parts.
>         This way it could try to pick a revision with no "highly
>         dubious" parts. Having looked at the article trust site, I'd
>         have a very hard time demarking what the maximum
>         untrustworthyness a section can have would be wihout being
>         under or over inclusive. I'd go with underinclusive. It does
>         seems reasonably doable at least. It has the advantage of
>         being fully automatic, so there will be a huge number of
>         articles with a "most trusted" (for lack of a better name)
>         version. It won't necessarily be stable, and could be quite
>         outdated though. In fact, even people who would otherwise have
>         Editor (basic review) rights would have their changes go to
>         the trusted version on edit. This would eat too much away at
>         editing incentive if the "most trusted" version was the
>         default if even experienced users could not directly control it.
>
>         So to sum up. Having a link to the "automatically selected
>         most trustworthy" version seems plausible, as long as it is
>         not the default. It has the advantage of leading to a burst of
>         pages with "trusted" versions without adding any real workload
>         whatsoever. The AT team would have to whip up and test around
>         with some algorithms though.
>
>         -Aaron Schulz
>
>         ----------------------------------------
>         > Date: Tue, 27 Nov 2007 20:29:51 +0000
>         > From: waldir at email.com <mailto:waldir at email.com>
>         > To: wikiquality-l at lists.wikimedia.org
>         <mailto:wikiquality-l at lists.wikimedia.org>
>         > Subject: [Wikiquality-l] Implicit vs Explicit metadata
>         >
>         > I am sure this has already been discussed, but just in case,
>         here goes
>         > my two cents:
>         >
>         > The post in
>         http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
>         > explains why implicit metadata (like Google's PageRank) are
>         better
>         > than explicit metadata (Like Digg votes).
>         > Making a comparison to Wikimedia, I'd say that Prof. Luca's
>         trust
>         > algorithm is a more reliable way to determine the quality of an
>         > article's text than the Flagged Revision Extension.
>         > However, the point of the latter is to provide a stable
>         version to the
>         > user who chooses that, while the former displays to which
>         degree the
>         > info can be trusted, but still showing the untrusted text.
>         >
>         > What I'd like to suggest is the implementation of a filter
>         based on
>         > the trust calculations of Prof. Luca's algorithm, which
>         would use the
>         > editors' calculated reliability to automatically choose to
>         display a
>         > certain revision of an article. It could be implemented in 3
>         ways:
>         >
>         > 1. Show the last revision of an article made by an editor
>         with a trust
>         > score bigger than the value that the reader provided. The
>         trusted
>         > editor is implicitly setting a minimum quality flag in the
>         article by
>         > saving a revision without changing other parts of the text.
>         This is
>         > the simpler approach, but it doent prevent untrusted text to
>         show up,
>         > in case the trusted editor leaves untrusted parts of the text
>         > unchanged.
>         >
>         > 2. Filter the full history. Basically, the idea is to show
>         the parts
>         > of the to the article written by users with a trust score
>         bigger than
>         > the value that the reader provided. This would work like
>         slashdot's
>         > comment filtering system, for example. Evidently, this is
>         the most
>         > complicated approach, since it would require an automated
>         conflict
>         > resolution system which might not be possible.
>         >
>         > 3. A mixed option could be to try to hide revisions by
>         editors with a
>         > lower trust value than the threshold set. This could be done
>         as far
>         > back in the article history as possible, while a content
>         conflict
>         > isn't found.
>         >
>         > Instead of trust values, this could also work by setting the
>         threshold
>         > above unregistered users, or newbies (I think this is
>         approximately
>         > equivalent to accounts younger than 4 days)
>         >
>         > Anyway, these are just rough ideas, on which I'd like to
>         hear your thoughts.
>         >
>         > _______________________________________________
>         > Wikiquality-l mailing list
>         > Wikiquality-l at lists.wikimedia.org
>         <mailto:Wikiquality-l at lists.wikimedia.org>
>         > http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
>         _________________________________________________________________
>         Put your friends on the big screen with Windows Vista® +
>         Windows Live™.
>         http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007
>         _______________________________________________
>         Wikiquality-l mailing list
>         Wikiquality-l at lists.wikimedia.org
>         <mailto:Wikiquality-l at lists.wikimedia.org>
>         http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
>
>
>
>     _______________________________________________
>     Wikiquality-l mailing list
>     Wikiquality-l at lists.wikimedia.org
>     <mailto:Wikiquality-l at lists.wikimedia.org>
>     http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Wikiquality-l mailing list
> Wikiquality-l at lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: john.erling.blad.vcf
Type: text/x-vcard
Size: 181 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/wikiquality-l/attachments/20071202/8b83de3a/attachment.vcf 


More information about the Wikiquality-l mailing list