Re: [Wikiquality-l] Implicit vs Explicit metadata

2 Dec 2007

An edit can be made by a very high rated editor and still be completely
in error. Very often this happens because the editor is biased on the
matter, or because he or she is to self confident due to excellence in
related fields of expertise. If a rating system could use multiple
reviewers it will probably rate contributions more correctly.

Contributions _should_ be rated by several persons, and they _should_ be
identified as diferent. This can be done by using the IP-adress and an
unique cookie identifier.

I guess the "top trust authors essentially vote for it by leaving it
there" means that authors are given thrust metrics and that they don't
have to do any manual interfering except reading the article to accept
it. In such a situation it is necessary to verify that they do in fact
read the article.

It is also possible to adjust the number off necessary known voters by
tracking all users, and counting each reader as a positive reader-vote.
If the number of readers are small the known voters must go up to
adjust. If the numbers of readers are very high they can be counted as
unknown voters with a very small correcting factor.

If a known voter goes against the contribution he should have a rather
high impact compared to pro-votes, and it should lock reader-votes. This
comes from the fact that contributors isn't very likely to air public
criticism. I'm not sure how this will be if the voting isn't public,
perhaps more users will vote against contributions but I doubt it.

John E

Luca de Alfaro skrev:
...
  In the trust algorithm I am implementing with my
collaborators (mainly
 Ian Pye and Bo Adler), text can gain top trust only when top trust
 authors essentially vote for it by leaving it there, so there is not
 so much difference in the "voting" part wrt flagged revisions.  The
 main difference is that in our algorithm, the text that is unchanged
 inherits the trust from the text in the previous revision --- no need
 for re-voting.  The algorithm has been described in a techrep
 http://www.soe.ucsc.edu/~luca/papers/07/trust-techrep.html
 <http://www.soe.ucsc.edu/%7Eluca/papers/07/trust-techrep.html> .

 This week, we will open access to a demo of the whole Wikipedia, as of
 its February 6, 2007, snapshot, colored for trust.  We are now working
 towards making a "live" real-time version, so that revisions can be
 colored as soon as they are created.
 Currently, people can "vote" for the accuracy of text only by leaving
 it unchanged during an edit.  We plan, for the live implementation, to
 give an "I agree with it" button, which enables people to vote for the
 accuracy of text without the need for editing it.

 We have also considered selecting as "stable" revisions revisions that
 are both recent, and of high trust.  We agree with the original poster
 that this may have some advantages wrt flagged revisions:

     * No need to explicitly spend time flagging (there are millions of
       pages); editor time can be used for editing or contributing. 
     * Flagging pages is time consuming, and re-flagging pages after
       they change is even more so.  In our algorithm, there is no need
       for re-flagging. If a high-trust page is subject to
       modifications, and these modifications are approved or left in
       place by high-reputation authors (and all editors are
       high-reputation), the newer version is automatically selected,
       without need for explicit re-flagging.
     * As the trust algorithm is automatic, there won't be the problem
       of flagged revisions becoming outdated with respect to the most
       recent revision: if many authors (including some high-reputation
       ones) agree with a recent revision, the  recent revision will
       automatically become high trust, and thus selected. 
     * Our algorithm actually requires the consensus, accumulated
       through revisions, of more than one high-reputation author, to
       label text as high trust.  A rogue high-reputation editor cannot
       single-handedly create high-trust text.

 When the demo will be available later this week, you will be able to
 judge whether you would like to select recent revisions that are of
 high trust.   The algorithm we have now is not perfect, and we are
 considering improvements (we will update the demo when we have such
 improvements), but in general we share the opinion of the original
 poster: an automatic method for flagging revisions can be more
 accurate (preventing flagged revisions from becoming out of date), and
 less painful (no need to spend time flagging).

 Luca

 On Dec 2, 2007 7:32 AM, Waldir Pimenta &lt;waldir(a)email.com
 <mailto:waldir@email.com>> wrote:

     I agree with you, Aaron. Flagged Revisions by *trusted users* is
     indeed better than automatic trustworthiness evaluation. Ratings
     by everyone, probably wouldn't be. But I'd say, following your own
     words ("It has the advantage of leading to a burst of pages with
     "trusted" versions without adding any real workload whatsoever"),
     why not having this the default option if there is no flagged
     version available (yet)? Perhaps with a note, shown to people who
     choose to view stable versions (or to all unlogged readers, if the
     stable versions are to be default), similar to the one that we see
     when we're consulting an old revision. It seems to me that this is
     better than showing the current version if the flagged one doesn't
     exist, or is too far away in the revision history. (with the
     "stable view" enabled, that is). Also, picking a revision with no
     "highly dubious" parts sounds a good approach to me :)

     Waldir

     On Nov 28, 2007 12:40 AM, Aaron Schulz &lt;jschulz_4587(a)msn.com
     <mailto:jschulz_4587@msn.com>> wrote:

         Flagged Revisions and Article Trust are really apples and
         oranges. I have contacted them, and let them know I'd be
         interested in getting this up into a stable extension; they
         are not in competetion.

         Anyway, my problem with that article about implicit vs.
         explicit metadata is that a)it assumes any random user can
         rate, b)you are measuring simple things like
         interesting/cool/worth reading, and c) you don't care too much
         if bad content shows sometimes. The problem is that none of
         these hold true here. Flagged Revisions uses
         Editors/Reviewers, it mainly checks accuracy, and we don't
         want high profile pages/living people articles/highly
         vandalized pages as well as eventually anything to show up
         with vandalism. Going to "George Bush" and seeing a vulva for
         the infobox is not ever acceptable (I don't even know if
         Article Trust rates images), even if the vandalism is darker
         orange or whatever.

         The Article Trust code looks at the page authors. To a large
         extent, this quite good at highlighting the more dubious
         stuff. On the other hand, things become less orange with new
         edits (since it is less likely to be crap). The downside is
         that cruft and garbage can get less orange and appear more
         valid. This can easily happen with large articles and section
         editing. That makes this it very hard to use for quality
         versions. Flagged Revisions would be better at that.

         Vandalism can take days to clean up. If AT is to be selecting
         the best revision, it should trying to check both global
         average trust of each revision as well as it's worst parts.
         This way it could try to pick a revision with no "highly
         dubious" parts. Having looked at the article trust site, I'd
         have a very hard time demarking what the maximum
         untrustworthyness a section can have would be wihout being
         under or over inclusive. I'd go with underinclusive. It does
         seems reasonably doable at least. It has the advantage of
         being fully automatic, so there will be a huge number of
         articles with a "most trusted" (for lack of a better name)
         version. It won't necessarily be stable, and could be quite
         outdated though. In fact, even people who would otherwise have
         Editor (basic review) rights would have their changes go to
         the trusted version on edit. This would eat too much away at
         editing incentive if the "most trusted" version was the
         default if even experienced users could not directly control it.

         So to sum up. Having a link to the "automatically selected
         most trustworthy" version seems plausible, as long as it is
         not the default. It has the advantage of leading to a burst of
         pages with "trusted" versions without adding any real workload
         whatsoever. The AT team would have to whip up and test around
         with some algorithms though.

         -Aaron Schulz

         ----------------------------------------
  Date: Tue, 27 Nov 2007 20:29:51 +0000
 From: waldir(a)email.com <mailto:waldir@email.com>
 To: wikiquality-l(a)lists.wikimedia.org         
<mailto:wikiquality-l@lists.wikimedia.org>
  Subject: [Wikiquality-l] Implicit vs Explicit
metadata

 I am sure this has already been discussed, but just in case,          here goes
  my two cents:

 The post in         
http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
  explains why implicit metadata (like Google's
PageRank) are          better
  than explicit metadata (Like Digg votes).
 Making a comparison to Wikimedia, I'd say that Prof. Luca's          trust
  algorithm is a more reliable way to determine the
quality of an
 article's text than the Flagged Revision Extension.
 However, the point of the latter is to provide a stable          version to the
  user who chooses that, while the former displays
to which          degree the
  info can be trusted, but still showing the
untrusted text.

 What I'd like to suggest is the implementation of a filter          based on
  the trust calculations of Prof. Luca's
algorithm, which          would use the
  editors' calculated reliability to
automatically choose to          display a
  certain revision of an article. It could be
implemented in 3          ways:

 1. Show the last revision of an article made by an editor          with a trust
  score bigger than the value that the reader
provided. The          trusted
  editor is implicitly setting a minimum quality
flag in the          article by
  saving a revision without changing other parts of
the text.          This is
  the simpler approach, but it doent prevent
untrusted text to          show up,
  in case the trusted editor leaves untrusted parts
of the text
 unchanged.

 2. Filter the full history. Basically, the idea is to show          the parts
  of the to the article written by users with a
trust score          bigger than
  the value that the reader provided. This would
work like          slashdot's
  comment filtering system, for example. Evidently,
this is          the most
  complicated approach, since it would require an
automated          conflict
  resolution system which might not be possible.

 3. A mixed option could be to try to hide revisions by          editors with a
  lower trust value than the threshold set. This
could be done          as far
  back in the article history as possible, while a
content          conflict
  isn't found.

 Instead of trust values, this could also work by setting the          threshold
  above unregistered users, or newbies (I think
this is          approximately
  equivalent to accounts younger than 4 days)

 Anyway, these are just rough ideas, on which I'd like to          hear your
thoughts.

 _______________________________________________
 Wikiquality-l mailing list
 Wikiquality-l(a)lists.wikimedia.org         
<mailto:Wikiquality-l@lists.wikimedia.org>

http://lists.wikimedia.org/mailman/listinfo/wikiquality-l         
_________________________________________________________________
         Put your friends on the big screen with Windows Vista® +
         Windows Live™.

http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC…
         _______________________________________________
         Wikiquality-l mailing list
         Wikiquality-l(a)lists.wikimedia.org
         <mailto:Wikiquality-l@lists.wikimedia.org>
         http://lists.wikimedia.org/mailman/listinfo/wikiquality-l

     _______________________________________________
     Wikiquality-l mailing list
     Wikiquality-l(a)lists.wikimedia.org
     <mailto:Wikiquality-l@lists.wikimedia.org>
     http://lists.wikimedia.org/mailman/listinfo/wikiquality-l

 ------------------------------------------------------------------------

 _______________________________________________
 Wikiquality-l mailing list
 Wikiquality-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikiquality-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikiquality-l] Implicit vs Explicit metadata