Re: [Wikipedia-l] Wikipedia-l Digest, Vol 64, Issue 3 - Wikipedia-l

25 Nov 2008

      susceibe
----- Original Message ----- 
From: wikipedia-l-request@lists.wikimedia.org
To: wikipedia-l@lists.wikimedia.org
Sent: Tuesday, November 25, 2008 1:35 AM
Subject: Wikipedia-l Digest, Vol 64, Issue 3
...
Send Wikipedia-l mailing list submissions to
wikipedia-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
or, via email, send a message with subject or body 'help' to
wikipedia-l-request@lists.wikimedia.org
You can reach the person managing the list at
wikipedia-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikipedia-l digest..."
Today's Topics:

suscribe (Jocla)
Study on Interfaces to Improving Wikipedia Quality
(avani@cs.umn.edu)
Re: Study on Interfaces to Improving Wikipedia Quality
(michael west)
Re: Study on Interfaces to Improving Wikipedia Quality
(Joseph Reagle)
Re: Study on Interfaces to Improving Wikipedia Quality
(Maury Markowitz)
Re: Study on Interfaces to Improving Wikipedia Quality
(Gregory Maxwell)
Re: Study on Interfaces to Improving Wikipedia Quality
(Luca de Alfaro)
Re: Study on Interfaces to Improving Wikipedia Quality
(Luca de Alfaro)

Message: 1
Date: Wed, 19 Nov 2008 18:22:03 -0000
From: "Jocla" paresdoce@gmail.com
Subject: [Wikipedia-l] suscribe
To: wikipedia-l@lists.wikimedia.org
Message-ID: 001c01c94a73$ba1850e0$7f01a8c0@windows337902b
Content-Type: text/plain; charset="iso-8859-1"
thanks for your e-mail, i would like to suscribe.

Message: 2
Date: 19 Nov 2008 13:23:53 -0600
From: avani@cs.umn.edu
Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID: Prayer.1.0.18.0811191323530.7842@sabinus.cs.umn.edu
Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Dear All,
My name is Avanidhar Chandrasekaran
(http://en.wikipedia.org/wiki/User_talk:Avanidhar).
I work with GroupLens Research at the University of Minnesota, Twin 
Cities.
As part of my research, I am involved in analyzing the usefulness and
Necessity of author reputation in Wikipedia.
In lieu of this, I have simulated an Interface to color words in an 
article
based on their Age.
Being experienced contributors to Wikipedia, I invite you to participate 
in
this study, which involves the following.

Please visit the following Instances of wikipedia and evaluate the

interface components which have been incorporated into each of them. Each
of these use their own algorithm to color text.
a) The Wikitrust project
http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
b) The Wiki-reputation project at Grouplens research
http://wiki-reputation.cs.umn.edu/index.php/Main_Page

Once you have evaluated the two interfaces, kindly complete this survey

on Wikipedia quality
http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
We hope to get your valuable feedback on these interfaces and how 
Wikipedia
article quality can be improved.
Thanks for your time
Avanidhar Chandrasekaran,
GroupLens Research, University of Minnesota

Message: 3
Date: Wed, 19 Nov 2008 20:01:27 +0000
From: "michael west" michawest@gmail.com
Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID:
cfe6de600811191201h727fb4e4s9660f64f2815c93f@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
2008/11/19 avani@cs.umn.edu
...
Dear All,
My name is Avanidhar Chandrasekaran
(http://en.wikipedia.org/wiki/User_talk:Avanidhar).
I work with GroupLens Research at the University of Minnesota, Twin 
Cities.
As part of my research, I am involved in analyzing the usefulness and
Necessity of author reputation in Wikipedia.
In lieu of this, I have simulated an Interface to color words in an 
article
based on their Age.
Being experienced contributors to Wikipedia, I invite you to participate 
in
this study, which involves the following.

Please visit the following Instances of wikipedia and evaluate the

interface components which have been incorporated into each of them. Each
of these use their own algorithm to color text.
a) The Wikitrust project
http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
b) The Wiki-reputation project at Grouplens research
http://wiki-reputation.cs.umn.edu/index.php/Main_Page

Once you have evaluated the two interfaces, kindly complete this

survey
on Wikipedia quality
http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
We hope to get your valuable feedback on these interfaces and how 
Wikipedia
article quality can be improved.
Thanks for your time
Avanidhar Chandrasekaran,
GroupLens Research, University of Minnesota
Quite interesting - the "age of words" color coding might be useful in
detecting obtuse type vandalism.
m

Message: 4
Date: Wed, 19 Nov 2008 17:40:23 -0500
From: Joseph Reagle reagle@mit.edu
Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID: 200811191740.23471.reagle@mit.edu
Content-Type: text/plain;  charset="iso-8859-1"
On Wednesday 19 November 2008, avani@cs.umn.edu wrote:
...
We hope to get your valuable feedback on these interfaces and how 
Wikipedia
article quality can be improved.
This might bias other respondants, but I thought it was an intersting idea 
so I wanted to share it. I concluded with the following which is no doubt 
affected by my being a WikiGnome:
[[
If I see an error, I fix it without much regard to time or author 
reputation. I do pay attention to and investigate author reputation on 
substantive issues on the discussion pages and it would be interesting to 
see a discussion thread colored according to reputation.
]]

Message: 5
Date: Sun, 23 Nov 2008 09:03:25 -0500
From: "Maury Markowitz" maury.markowitz@gmail.com
Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID:
5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
On Wed, Nov 19, 2008 at 2:23 PM,  avani@cs.umn.edu wrote:
...
We hope to get your valuable feedback on these interfaces and how 
Wikipedia
article quality can be improved.
Given the older snapshots, I selected older articles that I had
started, NuBUS and ARCNET.
The "time based" system from UMN did not work at all, every search
resulted in a page not found.
The USCS system did work, but gave me odd results. Apparently I have a
very bad reputation, because when I look in the History at the first
versions, which I wrote in entirety, it colored it all yellow!
Newer versions of the same articles had much more white, even though
huge portions of the text were still from the origial. This may be due
to diff problems -- I consider diff to be largely random in
effectiveness, sometimes it works, but othertimes a single whitespace
change, especially vertical, will make it think the entire article was
edited.
My guess is that the system is tripping over diffs like this, and thus
considering the article to have been re-written by another editor.
Since this has happened, MY reputation goes down, or so I understand
it.
I don?t think this system could possibly work if based on wiki's
diffs. If its going to work it?s going to need to use a much more
reliable system.
Another problem I see with it is that it will rank an author who?s
contributions are 1000 unchanged comma inserts to be as reliable as an
author who created a perfect 1000 character article (or perhaps rate
the first even higher). There should be some sort of length bias, if
an author makes a big edit, out of character, that?s important to
know.
Maury

Message: 6
Date: Sun, 23 Nov 2008 09:44:40 -0500
From: "Gregory Maxwell" gmaxwell@gmail.com
Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID:
e692861c0811230644i316f94abg6cafe7ef87f6bc3b@mail.gmail.com
Content-Type: text/plain; charset=UTF-8
On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz
maury.markowitz@gmail.com wrote:
...
On Wed, Nov 19, 2008 at 2:23 PM,  avani@cs.umn.edu wrote:
...
We hope to get your valuable feedback on these interfaces and how 
Wikipedia
article quality can be improved.
Given the older snapshots, I selected older articles that I had
started, NuBUS and ARCNET.
The "time based" system from UMN did not work at all, every search
resulted in a page not found.
The UMN system intentionally included only a small number (70?)
articles. This is why you needed to use the random page function to
browse among them.
This doesn't reflect any short coming of the system, but it most
likely just reflects the limits of computational resources they were
working under.
[snip]
...
Newer versions of the same articles had much more white, even though
huge portions of the text were still from the origial. This may be due
to diff problems -- I consider diff to be largely random in
effectiveness, sometimes it works, but othertimes a single whitespace
change, especially vertical, will make it think the entire article was
edited.
Yes, I had exactly the same experience with the USCS system: Different
coloring for text I'd added in same edit which created the article.
Quite inscrutable.
[snip]
...
Another problem I see with it is that it will rank an author who?s
contributions are 1000 unchanged comma inserts to be as reliable as an
author who created a perfect 1000 character article (or perhaps rate
the first even higher). There should be some sort of length bias, if
an author makes a big edit, out of character, that?s important to
know.
For the articles it covered I found the UMN system to be more usable:
It's output was more explicable, and the signal to noise ratio was
just better.  This may be partially due to bugs in the USCS history
analysis, and different a different choice in coloring thresholds
(USCS seemed to color almost everything, removing the usefulness of
color as something to draw my attention).
Even so, I'm distrustful of "reputation" as an automated metric.
Reputation is a fuzzy thing (consider your comma example), but time is
just a straight forward metric which is much easier to get right. Your
tireless and unreverted editing of external links tells me very little
about your ability to make a reliable edit to the intro of an article,
... or at least very little that I didn't already know by merely
knowing if your account was brand new or not. (New accounts are more
likely to be used by inexperienced and ill-motivated persons)
I believe a metric applied correctly, consistently, and understandably
is just going to be more useful than a metric which considers more
data but is also subject to more noise. The differential performance
between these two systems has done nothing but confirm my suspicions
in this regard.
A simply objective challenge for any predictive coloring system would
be to use them in the following experimental procedure:

Take a dump of Wikipedia up a year old, use this as the underlying

knowledge for the systems.

Make several random selections of articles and include the newer

revisions not included in the initial set up to 6 months old. Call
these the test sets.

The predictive coloring system should then take each revision in a

test set in time order and predict if it will be reverted (Within X
time?).

The actual edits up to now should be analyzed to determined which

changes actually were reverted and when.
The final score will be the false positive and false negative rates.
So long as e assume that the existing editing practices are not too
bad we should find that the best predictive coloring system would
generally tend to minimize these rates.

Message: 7
Date: Mon, 24 Nov 2008 17:22:23 -0800
From: "Luca de Alfaro" luca@dealfaro.org
Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID:
28fa90930811241722y25c26bf1i6441b489e3ff6285@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
I agree with Gregory that it is very useful to quantify the usefulness of
trust information on text -- otherwise, all comparison are very 
subjective.
In our WikiSym 08 paper, we measure various parameters of the "trust"
coloring we compute, including:

Recall of deletions.  Only 3.4% of text is in the lower half of trust

values, yet this is 66% of the text that is deleted in the very next
  revision.

Precision of deletions.  Text is the bottom half of trust values has

probability 33% of being deleted in the next revision, agaist a 
probability
  of 1.9% for general text.  The deletion probability raises to 62% for 
text
  in the bottom 20% of trust values.

We study the correlation between the trust of a word, sampled at

random
  in all revisions, and the future lifespan of a word (correcting for the
  finite horizon effect due to the finite number of revisions in each
  article), showing positive correlation.
Some aspects are not captured by the above measures:

We ensured that every "tampering" (including cut-and-paste) are

reflected in the trust coloring, so it is hard to subvert the algorithm
  (does "age" provide this?).

We ensured the whole scheme is robust wrt attacks (see the various

papers if you are interested).
I fully believe that it should not be hard to improve on our system re. 
the
above measurements.  And I fully agree that the "reputation" we compute is
essentially an internal parameter of the system, and does not really
constitute a good summary of a person's overall Wikipedia contribution; 
for
this and other reasons we do not display it.
Luca
A simply objective challenge for any predictive coloring system would
...
be to use them in the following experimental procedure:

Take a dump of Wikipedia up a year old, use this as the underlying

knowledge for the systems.

Make several random selections of articles and include the newer

revisions not included in the initial set up to 6 months old. Call
these the test sets.

The predictive coloring system should then take each revision in a

test set in time order and predict if it will be reverted (Within X
time?).

The actual edits up to now should be analyzed to determined which

changes actually were reverted and when.
The final score will be the false positive and false negative rates.
So long as e assume that the existing editing practices are not too
bad we should find that the best predictive coloring system would
generally tend to minimize these rates.
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Message: 8
Date: Mon, 24 Nov 2008 17:35:13 -0800
From: "Luca de Alfaro" luca@dealfaro.org
Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
Quality
To: wikipedia-l@lists.wikimedia.org
Message-ID:
28fa90930811241735l235af9cag554632448d80ef7@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
Maury,
perhaps I can help explain the behavior you saw in the UCSC system (I am 
one
of the developers).
New text is always somewhat orange, to signal to visitors that it has not
yet been fully reviewed.
The higher the reputation, the lighter the shade of orange, but orange it
still is (I have no idea of how high was your computed reputation when you
started writing that article).
Text background becomes white when other people revise it without
drastically changing it: this indicates consensus.
In our more recent code version, we also have a "vote" button; using this,
text can more speedily gain trust without need for many revisions to 
occur.
In a live experiment, where people can click on the vote button, I presume
the trust of the text would raise more rapidly.  Note that the code 
prevents
double voting, or creating sock-puppet accounts to vote, etc etc.
So I don't think based on what you say that the system is tripping over
diffs.  It is simply considering new text less trusted, and more revised
text more trusted, which is what we wanted.   It appears however we don't 
do
a very good job on the web site describing the algorithm (I guess we put
most of the description work in writing the papers... we will try to 
improve
the web site).
We don't measure "edit work" in number of edits, but in number of words
changed.
As you say, for our system, changing 1000 words in separate edits is the
same (provided the edits are all kept, i.e., not reverted) as providing a
single 1000-word contribution.   We thought of giving a larger prize to
larger contributions: precisely, of making the reputation increment
proportional to n^a, where n is the number of words, and a > 1.  This did
not work well for the Wikipedia, because it ended up not rewarding enough
the work of the many editors, who clean and polish the articles, thus 
making
many small edits.  Technically it would be trivial to change the code to
include such a non-linear reward scheme (to adopt rewards proportional to
n^a rather than n); whether it is desirable, I have no idea.  It does not
lead to better quantitative performance of the system, i.e., the resulting
trust is not better at predicting future text deletions.
Luca
...
The USCS system did work, but gave me odd results. Apparently I have a
very bad reputation, because when I look in the History at the first
versions, which I wrote in entirety, it colored it all yellow!
Newer versions of the same articles had much more white, even though
huge portions of the text were still from the origial. This may be due
to diff problems -- I consider diff to be largely random in
effectiveness, sometimes it works, but othertimes a single whitespace
change, especially vertical, will make it think the entire article was
edited.
My guess is that the system is tripping over diffs like this, and thus
considering the article to have been re-written by another editor.
Since this has happened, MY reputation goes down, or so I understand
it.
I don?t think this system could possibly work if based on wiki's
diffs. If its going to work it?s going to need to use a much more
reliable system.
Another problem I see with it is that it will rank an author who?s
contributions are 1000 unchanged comma inserts to be as reliable as an
author who created a perfect 1000 character article (or perhaps rate
the first even higher). There should be some sort of length bias, if
an author makes a big edit, out of character, that?s important to
know.
Maury

Wikipedia-l mailing list
Wikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Wikipedia-l mailing list
Wikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
End of Wikipedia-l Digest, Vol 64, Issue 3