[WikiEN-l] "Software Weighs Wikipedians' Trustworthiness"

Gwern Branwen gwern0 at gmail.com
Mon Aug 6 17:43:08 UTC 2007


On  0, Tim Starling <tstarling at wikimedia.org> scribbled:
> David Goodman wrote:
> > from
> > Chronicle of Higher Education, Wired Campus blog.
> > http://chronicle.com/wiredcampus/index.php?id=2278
> >
> > "software that color-codes Wikipedia entries, identifying those
> > portions deemed trustworthy and those that might be taken with a grain
> > of salt.
> >
>
> I spoke with Luca de Alfaro at length about this feature at Wikimania. I
> think the technology is great, and the performance is probably good
> enough to include it on Wikipedia itself. He assures me he will release
> the source code under a free license, as soon as it's presentable. Some
> programming work still needs to be done to make it work incrementally
> rather than on the entire history of the article, but the theory for
> this is entirely in place.
>
> There are two very important and separate elements to this:
>
> 1) A "blame map". Some might prefer to call it a "credit map" to be more
> polite. This is a data structure that lets you see who is responsible
> for what text. It can be updated on every edit. Having one stored in
> MediaWiki will enable all sorts of applications. Apparently it's
> old-hat, and not the subject of the present research, but it'll be great
> to have an implementation integrated with MediaWiki.
>
> 2) A reputation metric. This is a predictor of how long a given user's
> edits will stay in an article. It's novel, and it's the main topic of de
> Alfaro's research.
>
> These two elements could be used independently in any way we choose.
>
> The social implications of having a reputation metric are not lost on de
> Alfaro. He gives the following responses to the usual criticisms:
>
> 1. The reputation metric does not rank respected, established users --
> it has a maximum value which will be routinely obtained by many people.
>
> 2. The value of the metric for individual users is obscured. The only
> access to it is via the reputation-coloured article text. The annotated
> article text has no usernames attached, and the metric is not displayed
> on user pages or the like.
>
> 3. It's content-based which makes it harder to game than voting-based
> metrics.
>
> It's time for us to think about how we want to use this technology.
> There are lots of possibilities beyond the precise design that de Alfaro
> proposes. Brainstorm away.
>
> -- Tim Starling

The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles!
#We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits.
#Bots could probably benefit from this. An example: Pywikipedia's followlive.py script follows Newpages looking for dubious articles to display for the user to take action on. You could filter out all pages consisting of avg. reputation > n, or something.
#People have long suggested that edits by anons and new users be buffered for a while or approved; this might be a way of doing it.

--
gwern
MF Reflection NATOA Indigo AIEWS Weekly sorot Sex import Zen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/wikien-l/attachments/20070806/a21355ab/attachment.pgp 


More information about the WikiEN-l mailing list