Re: [Wikiquality-l] Implicit vs Explicit metadata

2 Dec 2007

Please forgive if this is ground covered previously; I've only just
subscribed to this list.

I wonder how this would apply to my home wiki (en.wikibooks) and other wikis
where editing is slow. Wikipedia has a constant deluge of edits from vandal
IPs up the "editing hierarchy" through to the most trusted editors.
Wikibooks, however, has a far smaller group of regular contributors,
(almost) all of which have a high reputation. Nonetheless, modules are often
untouched for weeks or months at a time - this is simply the nature of the
textbook-writing beast.

As I understand it (which could be totally off-base), this algorithm
requires editing of the text to determine trust. So what happens when the
text isn't edited for months at a time, as is the case with many of our
books?

In the absence of editing, it seems like explicit metadata wins out, since
there *is* no implicit metadata to use (or at least, much much less).

...
 From this point of view, it seems the only outstanding
downside to FlaggedRevs is the workload created. I wonder, though, if on
Implementation
Day, all current revisions (or do last week's revision or whatever) could be
tagged as an "initial" state (just so there's some starting place) and then
we go from there. For enwiki, there would be a flurry of people going though
and making explicit choices to flag a revision. And on enbooks, there would
be (much) slower progress in the same vein. This may not be advisable on the
large wikis. Wikibooks, however, has a relatively clean doormat. We have
vandalism, but it doesn't tend to hang around, so setting an initial state
would probably result in much/any vandalism getting caught in the net.

-Mike.lifeguard

  _____  

From: Luca de Alfaro [mailto:luca@soe.ucsc.edu] 
Sent: December 2, 2007 2:10 PM
To: Wikimedia Quality Discussions
Subject: Re: [Wikiquality-l] Implicit vs Explicit metadata

In the trust algorithm I am implementing with my collaborators (mainly Ian
Pye and Bo Adler), text can gain top trust only when top trust authors
essentially vote for it by leaving it there, so there is not so much
difference in the "voting" part wrt flagged revisions.  The main difference
is that in our algorithm, the text that is unchanged inherits the trust from
the text in the previous revision --- no need for re-voting.  The algorithm
has been described in a techrep
http://www.soe.ucsc.edu/~luca/papers/07/trust-techrep.html . 

This week, we will open access to a demo of the whole Wikipedia, as of its
February 6, 2007, snapshot, colored for trust.  We are now working towards
making a "live" real-time version, so that revisions can be colored as soon
as they are created. 
Currently, people can "vote" for the accuracy of text only by leaving it
unchanged during an edit.  We plan, for the live implementation, to give an
"I agree with it" button, which enables people to vote for the accuracy of
text without the need for editing it. 

We have also considered selecting as "stable" revisions revisions that are
both recent, and of high trust.  We agree with the original poster that this
may have some advantages wrt flagged revisions: 

*	No need to explicitly spend time flagging (there are millions of
pages); editor time can be used for editing or contributing. 
*	Flagging pages is time consuming, and re-flagging pages after they
change is even more so.  In our algorithm, there is no need for re-flagging.
If a high-trust page is subject to modifications, and these modifications
are approved or left in place by high-reputation authors (and all editors
are high-reputation), the newer version is automatically selected, without
need for explicit re-flagging. 
*	As the trust algorithm is automatic, there won't be the problem of
flagged revisions becoming outdated with respect to the most recent
revision: if many authors (including some high-reputation ones) agree with a
recent revision, the  recent revision will automatically become high trust,
and thus selected.  
*	Our algorithm actually requires the consensus, accumulated through
revisions, of more than one high-reputation author, to label text as high
trust.  A rogue high-reputation editor cannot single-handedly create
high-trust text. 

When the demo will be available later this week, you will be able to judge
whether you would like to select recent revisions that are of high trust.
The algorithm we have now is not perfect, and we are considering
improvements (we will update the demo when we have such improvements), but
in general we share the opinion of the original poster: an automatic method
for flagging revisions can be more accurate (preventing flagged revisions
from becoming out of date), and less painful (no need to spend time
flagging). 

Luca

On Dec 2, 2007 7:32 AM, Waldir Pimenta &lt;waldir(a)email.com&gt; wrote:

I agree with you, Aaron. Flagged Revisions by *trusted users* is indeed
better than automatic trustworthiness evaluation. Ratings by everyone,
probably wouldn't be. But I'd say, following your own words ("It has the
advantage of leading to a burst of pages with "trusted" versions without
adding any real workload whatsoever"), why not having this the default
option if there is no flagged version available (yet)? Perhaps with a note,
shown to people who choose to view stable versions (or to all unlogged
readers, if the stable versions are to be default), similar to the one that
we see when we're consulting an old revision. It seems to me that this is
better than showing the current version if the flagged one doesn't exist, or
is too far away in the revision history. (with the "stable view" enabled,
that is). Also, picking a revision with no "highly dubious" parts sounds a
good approach to me :) 

Waldir

On Nov 28, 2007 12:40 AM, Aaron Schulz &lt;jschulz_4587(a)msn.com
<mailto:jschulz_4587@msn.com> > wrote:

Flagged Revisions and Article Trust are really apples and oranges. I have
contacted them, and let them know I'd be interested in getting this up into
a stable extension; they are not in competetion.

Anyway, my problem with that article about implicit vs. explicit metadata is
that a)it assumes any random user can rate, b)you are measuring simple
things like interesting/cool/worth reading, and c) you don't care too much
if bad content shows sometimes. The problem is that none of these hold true
here. Flagged Revisions uses Editors/Reviewers, it mainly checks accuracy,
and we don't want high profile pages/living people articles/highly
vandalized pages as well as eventually anything to show up with vandalism.
Going to "George Bush" and seeing a vulva for the infobox is not ever
acceptable (I don't even know if Article Trust rates images), even if the
vandalism is darker orange or whatever. 

The Article Trust code looks at the page authors. To a large extent, this
quite good at highlighting the more dubious stuff. On the other hand, things
become less orange with new edits (since it is less likely to be crap). The
downside is that cruft and garbage can get less orange and appear more
valid. This can easily happen with large articles and section editing. That
makes this it very hard to use for quality versions. Flagged Revisions would
be better at that. 

Vandalism can take days to clean up. If AT is to be selecting the best
revision, it should trying to check both global average trust of each
revision as well as it's worst parts. This way it could try to pick a
revision with no "highly dubious" parts. Having looked at the article trust
site, I'd have a very hard time demarking what the maximum untrustworthyness
a section can have would be wihout being under or over inclusive. I'd go
with underinclusive. It does seems reasonably doable at least. It has the
advantage of being fully automatic, so there will be a huge number of
articles with a "most trusted" (for lack of a better name) version. It
won't
necessarily be stable, and could be quite outdated though. In fact, even
people who would otherwise have Editor (basic review) rights would have
their changes go to the trusted version on edit. This would eat too much
away at editing incentive if the "most trusted" version was the default if
even experienced users could not directly control it. 

So to sum up. Having a link to the "automatically selected most trustworthy"
version seems plausible, as long as it is not the default. It has the
advantage of leading to a burst of pages with "trusted" versions without
adding any real workload whatsoever. The AT team would have to whip up and
test around with some algorithms though. 

-Aaron Schulz

----------------------------------------
...
  Date: Tue, 27 Nov 2007 20:29:51 +0000
 From: waldir(a)email.com 
...
  To: wikiquality-l(a)lists.wikimedia.org 
...
  Subject: [Wikiquality-l] Implicit vs Explicit metadata

...

 I am sure this has already been discussed, but just in case, here goes 
 my two cents:

 The post in 
http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
...
  explains why implicit metadata (like Google's
PageRank) are better 
 than explicit metadata (Like Digg votes).
 Making a comparison to Wikimedia, I'd say that Prof. Luca's trust
 algorithm is a more reliable way to determine the quality of an
 article's text than the Flagged Revision Extension. 
 However, the point of the latter is to provide a stable version to the
 user who chooses that, while the former displays to which degree the
 info can be trusted, but still showing the untrusted text. 

 What I'd like to suggest is the implementation of a filter based on
 the trust calculations of Prof. Luca's algorithm, which would use the
 editors' calculated reliability to automatically choose to display a 
 certain revision of an article. It could be implemented in 3 ways:

 1. Show the last revision of an article made by an editor with a trust
 score bigger than the value that the reader provided. The trusted 
 editor is implicitly setting a minimum quality flag in the article by
 saving a revision without changing other parts of the text. This is
 the simpler approach, but it doent prevent untrusted text to show up, 
 in case the trusted editor leaves untrusted parts of the text
 unchanged.

 2. Filter the full history. Basically, the idea is to show the parts
 of the to the article written by users with a trust score bigger than 
 the value that the reader provided. This would work like slashdot's
 comment filtering system, for example. Evidently, this is the most
 complicated approach, since it would require an automated conflict 
 resolution system which might not be possible.

 3. A mixed option could be to try to hide revisions by editors with a
 lower trust value than the threshold set. This could be done as far
 back in the article history as possible, while a content conflict
 isn't found.

 Instead of trust values, this could also work by setting the threshold
 above unregistered users, or newbies (I think this is approximately 
 equivalent to accounts younger than 4 days)

 Anyway, these are just rough ideas, on which I'd like to hear your thoughts.
...

...
  _______________________________________________ 
 Wikiquality-l mailing list
 Wikiquality-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________
Put your friends on the big screen with Windows VistaR + Windows LiveT.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_
MediaCtr_bigscreen_102007
_______________________________________________
Wikiquality-l mailing list
Wikiquality-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l

_______________________________________________
Wikiquality-l mailing list
Wikiquality-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikiquality-l] Implicit vs Explicit metadata