Please forgive if this is ground covered
previously; I’ve only just subscribed to this list.
I wonder how this would apply to my home
wiki (en.wikibooks) and other wikis where editing is slow. Wikipedia has a
constant deluge of edits from vandal IPs up the “editing hierarchy” through to
the most trusted editors. Wikibooks, however, has a far smaller group of
regular contributors, (almost) all of which have a high reputation.
Nonetheless, modules are often untouched for weeks or months at a time – this is
simply the nature of the textbook-writing beast.
As I understand it (which could be totally
off-base), this algorithm requires editing of the text to determine trust. So
what happens when the text isn’t edited for months at a time, as is the case
with many of our books?
In the absence of editing, it seems like
explicit metadata wins out, since there *is*
no implicit metadata to use (or at least, much much less).
From this point of view, it seems the only
outstanding downside to FlaggedRevs is the workload created. I wonder, though,
if on Implementation Day, all current revisions (or do last week’s revision or
whatever) could be tagged as an “initial” state (just so there’s some starting
place) and then we go from there. For enwiki, there would be a flurry of people
going though and making explicit choices to flag a revision. And on enbooks,
there would be (much) slower progress in the same vein. This may not be
advisable on the large wikis. Wikibooks, however, has a relatively clean
doormat. We have vandalism, but it doesn’t tend to hang around, so setting an
initial state would probably result in much/any vandalism getting caught in the
net.
-Mike.lifeguard
From: Luca de Alfaro
[mailto:luca@soe.ucsc.edu]
Sent: December 2, 2007 2:10 PM
To: Wikimedia Quality Discussions
Subject: Re: [Wikiquality-l]
Implicit vs Explicit metadata
In the trust algorithm I am implementing with my collaborators (mainly
Ian Pye and Bo Adler), text can gain top trust only when top trust authors
essentially vote for it by leaving it there, so there is not so much difference
in the "voting" part wrt flagged revisions. The main difference
is that in our algorithm, the text that is unchanged inherits the trust from
the text in the previous revision --- no need for re-voting. The
algorithm has been described in a techrep http://www.soe.ucsc.edu/~luca/papers/07/trust-techrep.html
.
This week, we will open access to a demo of the whole Wikipedia, as of its
February 6, 2007, snapshot, colored for trust. We are now working towards
making a "live" real-time version, so that revisions can be colored
as soon as they are created.
Currently, people can "vote" for the accuracy of text only by leaving
it unchanged during an edit. We plan, for the live implementation, to
give an "I agree with it" button, which enables people to vote for
the accuracy of text without the need for editing it.
We have also considered selecting as "stable" revisions revisions
that are both recent, and of high trust. We agree with the original
poster that this may have some advantages wrt flagged revisions:
When the demo will be
available later this week, you will be able to judge whether you would like to
select recent revisions that are of high trust. The algorithm we
have now is not perfect, and we are considering improvements (we will update
the demo when we have such improvements), but in general we share the opinion
of the original poster: an automatic method for flagging revisions can be more
accurate (preventing flagged revisions from becoming out of date), and less
painful (no need to spend time flagging).
Luca
On Dec 2, 2007 7:32 AM, Waldir Pimenta <waldir@email.com> wrote:
I agree with you, Aaron. Flagged Revisions by *trusted users* is indeed
better than automatic trustworthiness evaluation. Ratings by everyone, probably
wouldn't be. But I'd say, following your own words ("It has the advantage
of leading to a burst of pages with "trusted" versions without adding
any real workload whatsoever"), why not having this the default option if
there is no flagged version available (yet)? Perhaps with a note, shown to
people who choose to view stable versions (or to all unlogged readers, if the
stable versions are to be default), similar to the one that we see when we're
consulting an old revision. It seems to me that this is better than showing the
current version if the flagged one doesn't exist, or is too far away in the
revision history. (with the "stable view" enabled, that is). Also,
picking a revision with no "highly dubious" parts sounds a good
approach to me :)
Waldir
On Nov 28, 2007 12:40 AM, Aaron Schulz <jschulz_4587@msn.com >
wrote:
Flagged Revisions and Article Trust are really apples and oranges. I have
contacted them, and let them know I'd be interested in getting this up into a
stable extension; they are not in competetion.
Anyway, my problem with that article about implicit vs. explicit metadata is
that a)it assumes any random user can rate, b)you are measuring simple things
like interesting/cool/worth reading, and c) you don't care too much if bad
content shows sometimes. The problem is that none of these hold true here.
Flagged Revisions uses Editors/Reviewers, it mainly checks accuracy, and we
don't want high profile pages/living people articles/highly vandalized pages as
well as eventually anything to show up with vandalism. Going to "George
Bush" and seeing a vulva for the infobox is not ever acceptable (I don't
even know if Article Trust rates images), even if the vandalism is darker
orange or whatever.
The Article Trust code looks at the page authors. To a large extent, this quite
good at highlighting the more dubious stuff. On the other hand, things become
less orange with new edits (since it is less likely to be crap). The downside
is that cruft and garbage can get less orange and appear more valid. This can
easily happen with large articles and section editing. That makes this it very
hard to use for quality versions. Flagged Revisions would be better at that.
Vandalism can take days to clean up. If AT is to be selecting the best
revision, it should trying to check both global average trust of each revision
as well as it's worst parts. This way it could try to pick a revision with no
"highly dubious" parts. Having looked at the article trust site, I'd
have a very hard time demarking what the maximum untrustworthyness a section
can have would be wihout being under or over inclusive. I'd go with
underinclusive. It does seems reasonably doable at least. It has the advantage
of being fully automatic, so there will be a huge number of articles with a
"most trusted" (for lack of a better name) version. It won't
necessarily be stable, and could be quite outdated though. In fact, even people
who would otherwise have Editor (basic review) rights would have their changes
go to the trusted version on edit. This would eat too much away at editing
incentive if the "most trusted" version was the default if even
experienced users could not directly control it.
So to sum up. Having a link to the "automatically selected most
trustworthy" version seems plausible, as long as it is not the default. It
has the advantage of leading to a burst of pages with "trusted"
versions without adding any real workload whatsoever. The AT team would have to
whip up and test around with some algorithms though.
-Aaron Schulz
----------------------------------------
> Date: Tue, 27 Nov 2007 20:29:51 +0000
> From: waldir@email.com
> Subject: [Wikiquality-l] Implicit vs Explicit metadata
>
> I am sure this has already been discussed, but just in case, here goes
> my two cents:
>
> The post in http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
> explains why implicit metadata (like Google's PageRank) are better
> than explicit metadata (Like Digg votes).
> Making a comparison to Wikimedia, I'd say that Prof. Luca's trust
> algorithm is a more reliable way to determine the quality of an
> article's text than the Flagged Revision Extension.
> However, the point of the latter is to provide a stable version to the
> user who chooses that, while the former displays to which degree the
> info can be trusted, but still showing the untrusted text.
>
> What I'd like to suggest is the implementation of a filter based on
> the trust calculations of Prof. Luca's algorithm, which would use the
> editors' calculated reliability to automatically choose to display a
> certain revision of an article. It could be implemented in 3 ways:
>
> 1. Show the last revision of an article made by an editor with a trust
> score bigger than the value that the reader provided. The trusted
> editor is implicitly setting a minimum quality flag in the article by
> saving a revision without changing other parts of the text. This is
> the simpler approach, but it doent prevent untrusted text to show up,
> in case the trusted editor leaves untrusted parts of the text
> unchanged.
>
> 2. Filter the full history. Basically, the idea is to show the parts
> of the to the article written by users with a trust score bigger than
> the value that the reader provided. This would work like slashdot's
> comment filtering system, for example. Evidently, this is the most
> complicated approach, since it would require an automated conflict
> resolution system which might not be possible.
>
> 3. A mixed option could be to try to hide revisions by editors with a
> lower trust value than the threshold set. This could be done as far
> back in the article history as possible, while a content conflict
> isn't found.
>
> Instead of trust values, this could also work by setting the threshold
> above unregistered users, or newbies (I think this is approximately
> equivalent to accounts younger than 4 days)
>
> Anyway, these are just rough ideas, on which I'd like to hear your
thoughts.
>
> _______________________________________________
> Wikiquality-l mailing list
> Wikiquality-l@lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007
_______________________________________________
Wikiquality-l mailing list
Wikiquality-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
Wikiquality-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l