Re: [Wikiquality-l] Implicit vs Explicit metadata

28 Nov 2007

Flagged Revisions and Article Trust are really apples and oranges. I have contacted them,
and let them know I'd be interested in getting this up into a stable extension; they
are not in competetion.

Anyway, my problem with that article about implicit vs. explicit metadata is that a)it
assumes any random user can rate, b)you are measuring simple things like
interesting/cool/worth reading, and c) you don't care too much if bad content shows
sometimes. The problem is that none of these hold true here. Flagged Revisions uses
Editors/Reviewers, it mainly checks accuracy, and we don't want high profile
pages/living people articles/highly vandalized pages as well as eventually anything to
show up with vandalism. Going to "George Bush" and seeing a vulva for the
infobox is not ever acceptable (I don't even know if Article Trust rates images), even
if the vandalism is darker orange or whatever. 

The Article Trust code looks at the page authors. To a large extent, this quite good at
highlighting the more dubious stuff. On the other hand, things become less orange with new
edits (since it is less likely to be crap). The downside is that cruft and garbage can get
less orange and appear more valid. This can easily happen with large articles and section
editing. That makes this it very hard to use for quality versions. Flagged Revisions would
be better at that.

Vandalism can take days to clean up. If AT is to be selecting the best revision, it should
trying to check both global average trust of each revision as well as it's worst
parts. This way it could try to pick a revision with no "highly dubious" parts.
Having looked at the article trust site, I'd have a very hard time demarking what the
maximum untrustworthyness a section can have would be wihout being under or over
inclusive. I'd go with underinclusive. It does seems reasonably doable at least. It
has the advantage of being fully automatic, so there will be a huge number of articles
with a "most trusted" (for lack of a better name) version. It won't
necessarily be stable, and could be quite outdated though. In fact, even people who would
otherwise have Editor (basic review) rights would have their changes go to the trusted
version on edit. This would eat too much away at editing incentive if the "most
trusted" version was the default if even experienced users could not directly control
it.

So to sum up. Having a link to the "automatically selected most trustworthy"
version seems plausible, as long as it is not the default. It has the advantage of leading
to a burst of pages with "trusted" versions without adding any real workload
whatsoever. The AT team would have to whip up and test around with some algorithms
though.

-Aaron Schulz

----------------------------------------
...
  Date: Tue, 27 Nov 2007 20:29:51 +0000
 From: waldir(a)email.com
 To: wikiquality-l(a)lists.wikimedia.org
 Subject: [Wikiquality-l] Implicit vs Explicit metadata

 I am sure this has already been discussed, but just in case, here goes
 my two cents:

 The post in http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
 explains why implicit metadata (like Google's PageRank) are better
 than explicit metadata (Like Digg votes).
 Making a comparison to Wikimedia, I'd say that Prof. Luca's trust
 algorithm is a more reliable way to determine the quality of an
 article's text than the Flagged Revision Extension.
 However, the point of the latter is to provide a stable version to the
 user who chooses that, while the former displays to which degree the
 info can be trusted, but still showing the untrusted text.

 What I'd like to suggest is the implementation of a filter based on
 the trust calculations of Prof. Luca's algorithm, which would use the
 editors' calculated reliability to automatically choose to display a
 certain revision of an article. It could be implemented in 3 ways:

 1. Show the last revision of an article made by an editor with a trust
 score bigger than the value that the reader provided. The trusted
 editor is implicitly setting a minimum quality flag in the article by
 saving a revision without changing other parts of the text. This is
 the simpler approach, but it doent prevent untrusted text to show up,
 in case the trusted editor leaves untrusted parts of the text
 unchanged.

 2. Filter the full history. Basically, the idea is to show the parts
 of the to the article written by users with a trust score bigger than
 the value that the reader provided. This would work like slashdot's
 comment filtering system, for example. Evidently, this is the most
 complicated approach, since it would require an automated conflict
 resolution system which might not be possible.

 3. A mixed option could be to try to hide revisions by editors with a
 lower trust value than the threshold set. This could be done as far
 back in the article history as possible, while a content conflict
 isn't found.

 Instead of trust values, this could also work by setting the threshold
 above unregistered users, or newbies (I think this is approximately
 equivalent to accounts younger than 4 days)

 Anyway, these are just rough ideas, on which I'd like to hear your thoughts.

 _______________________________________________
 Wikiquality-l mailing list
 Wikiquality-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC…

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikiquality-l] Implicit vs Explicit metadata