susceibe ----- Original Message ----- From: wikipedia-l-request@lists.wikimedia.org To: wikipedia-l@lists.wikimedia.org Sent: Tuesday, November 25, 2008 1:35 AM Subject: Wikipedia-l Digest, Vol 64, Issue 3
Send Wikipedia-l mailing list submissions to wikipedia-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wikipedia-l or, via email, send a message with subject or body 'help' to wikipedia-l-request@lists.wikimedia.org
You can reach the person managing the list at wikipedia-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikipedia-l digest..."
Today's Topics:
- suscribe (Jocla)
- Study on Interfaces to Improving Wikipedia Quality (avani@cs.umn.edu)
- Re: Study on Interfaces to Improving Wikipedia Quality (michael west)
- Re: Study on Interfaces to Improving Wikipedia Quality (Joseph Reagle)
- Re: Study on Interfaces to Improving Wikipedia Quality (Maury Markowitz)
- Re: Study on Interfaces to Improving Wikipedia Quality (Gregory Maxwell)
- Re: Study on Interfaces to Improving Wikipedia Quality (Luca de Alfaro)
- Re: Study on Interfaces to Improving Wikipedia Quality (Luca de Alfaro)
Message: 1 Date: Wed, 19 Nov 2008 18:22:03 -0000 From: "Jocla" paresdoce@gmail.com Subject: [Wikipedia-l] suscribe To: wikipedia-l@lists.wikimedia.org Message-ID: 001c01c94a73$ba1850e0$7f01a8c0@windows337902b Content-Type: text/plain; charset="iso-8859-1"
thanks for your e-mail, i would like to suscribe.
Message: 2 Date: 19 Nov 2008 13:23:53 -0600 From: avani@cs.umn.edu Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: Prayer.1.0.18.0811191323530.7842@sabinus.cs.umn.edu Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Dear All,
My name is Avanidhar Chandrasekaran (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
I work with GroupLens Research at the University of Minnesota, Twin Cities. As part of my research, I am involved in analyzing the usefulness and Necessity of author reputation in Wikipedia.
In lieu of this, I have simulated an Interface to color words in an article based on their Age.
Being experienced contributors to Wikipedia, I invite you to participate in this study, which involves the following.
- Please visit the following Instances of wikipedia and evaluate the
interface components which have been incorporated into each of them. Each of these use their own algorithm to color text.
a) The Wikitrust project
http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
b) The Wiki-reputation project at Grouplens research
http://wiki-reputation.cs.umn.edu/index.php/Main_Page
- Once you have evaluated the two interfaces, kindly complete this survey
on Wikipedia quality
http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
We hope to get your valuable feedback on these interfaces and how Wikipedia article quality can be improved.
Thanks for your time
Avanidhar Chandrasekaran,
GroupLens Research, University of Minnesota
Message: 3 Date: Wed, 19 Nov 2008 20:01:27 +0000 From: "michael west" michawest@gmail.com Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: cfe6de600811191201h727fb4e4s9660f64f2815c93f@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
2008/11/19 avani@cs.umn.edu
Dear All,
My name is Avanidhar Chandrasekaran (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
I work with GroupLens Research at the University of Minnesota, Twin Cities. As part of my research, I am involved in analyzing the usefulness and Necessity of author reputation in Wikipedia.
In lieu of this, I have simulated an Interface to color words in an article based on their Age.
Being experienced contributors to Wikipedia, I invite you to participate in this study, which involves the following.
- Please visit the following Instances of wikipedia and evaluate the
interface components which have been incorporated into each of them. Each of these use their own algorithm to color text.
a) The Wikitrust project
http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
b) The Wiki-reputation project at Grouplens research
http://wiki-reputation.cs.umn.edu/index.php/Main_Page
- Once you have evaluated the two interfaces, kindly complete this
survey on Wikipedia quality
http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
We hope to get your valuable feedback on these interfaces and how Wikipedia article quality can be improved.
Thanks for your time
Avanidhar Chandrasekaran,
GroupLens Research, University of Minnesota
Quite interesting - the "age of words" color coding might be useful in detecting obtuse type vandalism.
m
Message: 4 Date: Wed, 19 Nov 2008 17:40:23 -0500 From: Joseph Reagle reagle@mit.edu Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: 200811191740.23471.reagle@mit.edu Content-Type: text/plain; charset="iso-8859-1"
On Wednesday 19 November 2008, avani@cs.umn.edu wrote:
We hope to get your valuable feedback on these interfaces and how Wikipedia article quality can be improved.
This might bias other respondants, but I thought it was an intersting idea so I wanted to share it. I concluded with the following which is no doubt affected by my being a WikiGnome:
[[ If I see an error, I fix it without much regard to time or author reputation. I do pay attention to and investigate author reputation on substantive issues on the discussion pages and it would be interesting to see a discussion thread colored according to reputation. ]]
Message: 5 Date: Sun, 23 Nov 2008 09:03:25 -0500 From: "Maury Markowitz" maury.markowitz@gmail.com Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: 5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On Wed, Nov 19, 2008 at 2:23 PM, avani@cs.umn.edu wrote:
We hope to get your valuable feedback on these interfaces and how Wikipedia article quality can be improved.
Given the older snapshots, I selected older articles that I had started, NuBUS and ARCNET.
The "time based" system from UMN did not work at all, every search resulted in a page not found.
The USCS system did work, but gave me odd results. Apparently I have a very bad reputation, because when I look in the History at the first versions, which I wrote in entirety, it colored it all yellow!
Newer versions of the same articles had much more white, even though huge portions of the text were still from the origial. This may be due to diff problems -- I consider diff to be largely random in effectiveness, sometimes it works, but othertimes a single whitespace change, especially vertical, will make it think the entire article was edited.
My guess is that the system is tripping over diffs like this, and thus considering the article to have been re-written by another editor. Since this has happened, MY reputation goes down, or so I understand it.
I don?t think this system could possibly work if based on wiki's diffs. If its going to work it?s going to need to use a much more reliable system.
Another problem I see with it is that it will rank an author who?s contributions are 1000 unchanged comma inserts to be as reliable as an author who created a perfect 1000 character article (or perhaps rate the first even higher). There should be some sort of length bias, if an author makes a big edit, out of character, that?s important to know.
Maury
Message: 6 Date: Sun, 23 Nov 2008 09:44:40 -0500 From: "Gregory Maxwell" gmaxwell@gmail.com Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: e692861c0811230644i316f94abg6cafe7ef87f6bc3b@mail.gmail.com Content-Type: text/plain; charset=UTF-8
On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz maury.markowitz@gmail.com wrote:
On Wed, Nov 19, 2008 at 2:23 PM, avani@cs.umn.edu wrote:
We hope to get your valuable feedback on these interfaces and how Wikipedia article quality can be improved.
Given the older snapshots, I selected older articles that I had started, NuBUS and ARCNET.
The "time based" system from UMN did not work at all, every search resulted in a page not found.
The UMN system intentionally included only a small number (70?) articles. This is why you needed to use the random page function to browse among them.
This doesn't reflect any short coming of the system, but it most likely just reflects the limits of computational resources they were working under.
[snip]
Newer versions of the same articles had much more white, even though huge portions of the text were still from the origial. This may be due to diff problems -- I consider diff to be largely random in effectiveness, sometimes it works, but othertimes a single whitespace change, especially vertical, will make it think the entire article was edited.
Yes, I had exactly the same experience with the USCS system: Different coloring for text I'd added in same edit which created the article. Quite inscrutable.
[snip]
Another problem I see with it is that it will rank an author who?s contributions are 1000 unchanged comma inserts to be as reliable as an author who created a perfect 1000 character article (or perhaps rate the first even higher). There should be some sort of length bias, if an author makes a big edit, out of character, that?s important to know.
For the articles it covered I found the UMN system to be more usable: It's output was more explicable, and the signal to noise ratio was just better. This may be partially due to bugs in the USCS history analysis, and different a different choice in coloring thresholds (USCS seemed to color almost everything, removing the usefulness of color as something to draw my attention).
Even so, I'm distrustful of "reputation" as an automated metric. Reputation is a fuzzy thing (consider your comma example), but time is just a straight forward metric which is much easier to get right. Your tireless and unreverted editing of external links tells me very little about your ability to make a reliable edit to the intro of an article, ... or at least very little that I didn't already know by merely knowing if your account was brand new or not. (New accounts are more likely to be used by inexperienced and ill-motivated persons)
I believe a metric applied correctly, consistently, and understandably is just going to be more useful than a metric which considers more data but is also subject to more noise. The differential performance between these two systems has done nothing but confirm my suspicions in this regard.
A simply objective challenge for any predictive coloring system would be to use them in the following experimental procedure:
- Take a dump of Wikipedia up a year old, use this as the underlying
knowledge for the systems.
- Make several random selections of articles and include the newer
revisions not included in the initial set up to 6 months old. Call these the test sets.
- The predictive coloring system should then take each revision in a
test set in time order and predict if it will be reverted (Within X time?).
- The actual edits up to now should be analyzed to determined which
changes actually were reverted and when.
The final score will be the false positive and false negative rates. So long as e assume that the existing editing practices are not too bad we should find that the best predictive coloring system would generally tend to minimize these rates.
Message: 7 Date: Mon, 24 Nov 2008 17:22:23 -0800 From: "Luca de Alfaro" luca@dealfaro.org Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: 28fa90930811241722y25c26bf1i6441b489e3ff6285@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
I agree with Gregory that it is very useful to quantify the usefulness of trust information on text -- otherwise, all comparison are very subjective. In our WikiSym 08 paper, we measure various parameters of the "trust" coloring we compute, including:
- Recall of deletions. Only 3.4% of text is in the lower half of trust
values, yet this is 66% of the text that is deleted in the very next revision.
- Precision of deletions. Text is the bottom half of trust values has
probability 33% of being deleted in the next revision, agaist a probability of 1.9% for general text. The deletion probability raises to 62% for text in the bottom 20% of trust values.
- We study the correlation between the trust of a word, sampled at
random in all revisions, and the future lifespan of a word (correcting for the finite horizon effect due to the finite number of revisions in each article), showing positive correlation.
Some aspects are not captured by the above measures:
- We ensured that every "tampering" (including cut-and-paste) are
reflected in the trust coloring, so it is hard to subvert the algorithm (does "age" provide this?).
- We ensured the whole scheme is robust wrt attacks (see the various
papers if you are interested).
I fully believe that it should not be hard to improve on our system re. the above measurements. And I fully agree that the "reputation" we compute is essentially an internal parameter of the system, and does not really constitute a good summary of a person's overall Wikipedia contribution; for this and other reasons we do not display it.
Luca
A simply objective challenge for any predictive coloring system would
be to use them in the following experimental procedure:
- Take a dump of Wikipedia up a year old, use this as the underlying
knowledge for the systems.
- Make several random selections of articles and include the newer
revisions not included in the initial set up to 6 months old. Call these the test sets.
- The predictive coloring system should then take each revision in a
test set in time order and predict if it will be reverted (Within X time?).
- The actual edits up to now should be analyzed to determined which
changes actually were reverted and when.
The final score will be the false positive and false negative rates. So long as e assume that the existing editing practices are not too bad we should find that the best predictive coloring system would generally tend to minimize these rates. _______________________________________________ Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
Message: 8 Date: Mon, 24 Nov 2008 17:35:13 -0800 From: "Luca de Alfaro" luca@dealfaro.org Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia Quality To: wikipedia-l@lists.wikimedia.org Message-ID: 28fa90930811241735l235af9cag554632448d80ef7@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
Maury,
perhaps I can help explain the behavior you saw in the UCSC system (I am one of the developers). New text is always somewhat orange, to signal to visitors that it has not yet been fully reviewed. The higher the reputation, the lighter the shade of orange, but orange it still is (I have no idea of how high was your computed reputation when you started writing that article).
Text background becomes white when other people revise it without drastically changing it: this indicates consensus. In our more recent code version, we also have a "vote" button; using this, text can more speedily gain trust without need for many revisions to occur. In a live experiment, where people can click on the vote button, I presume the trust of the text would raise more rapidly. Note that the code prevents double voting, or creating sock-puppet accounts to vote, etc etc.
So I don't think based on what you say that the system is tripping over diffs. It is simply considering new text less trusted, and more revised text more trusted, which is what we wanted. It appears however we don't do a very good job on the web site describing the algorithm (I guess we put most of the description work in writing the papers... we will try to improve the web site).
We don't measure "edit work" in number of edits, but in number of words changed. As you say, for our system, changing 1000 words in separate edits is the same (provided the edits are all kept, i.e., not reverted) as providing a single 1000-word contribution. We thought of giving a larger prize to larger contributions: precisely, of making the reputation increment proportional to n^a, where n is the number of words, and a > 1. This did not work well for the Wikipedia, because it ended up not rewarding enough the work of the many editors, who clean and polish the articles, thus making many small edits. Technically it would be trivial to change the code to include such a non-linear reward scheme (to adopt rewards proportional to n^a rather than n); whether it is desirable, I have no idea. It does not lead to better quantitative performance of the system, i.e., the resulting trust is not better at predicting future text deletions.
Luca
The USCS system did work, but gave me odd results. Apparently I have a very bad reputation, because when I look in the History at the first versions, which I wrote in entirety, it colored it all yellow!
Newer versions of the same articles had much more white, even though huge portions of the text were still from the origial. This may be due to diff problems -- I consider diff to be largely random in effectiveness, sometimes it works, but othertimes a single whitespace change, especially vertical, will make it think the entire article was edited.
My guess is that the system is tripping over diffs like this, and thus considering the article to have been re-written by another editor. Since this has happened, MY reputation goes down, or so I understand it.
I don?t think this system could possibly work if based on wiki's diffs. If its going to work it?s going to need to use a much more reliable system.
Another problem I see with it is that it will rank an author who?s contributions are 1000 unchanged comma inserts to be as reliable as an author who created a perfect 1000 character article (or perhaps rate the first even higher). There should be some sort of length bias, if an author makes a big edit, out of character, that?s important to know.
Maury
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
End of Wikipedia-l Digest, Vol 64, Issue 3
wikipedia-l@lists.wikimedia.org