[Wikipedia-l] hel Re: Wikipedia-l Digest, Vol 64, Issue 4

Tue Nov 25 12:19:30 UTC 2008

----- Original Message ----- 
From: <wikipedia-l-request at lists.wikimedia.org>
To: <wikipedia-l at lists.wikimedia.org>
Sent: Tuesday, November 25, 2008 7:59 AM
Subject: Wikipedia-l Digest, Vol 64, Issue 4


> Send Wikipedia-l mailing list submissions to
> wikipedia-l at lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
> or, via email, send a message with subject or body 'help' to
> wikipedia-l-request at lists.wikimedia.org
>
> You can reach the person managing the list at
> wikipedia-l-owner at lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikipedia-l digest..."
>
>
> Today's Topics:
>
>   1. Re: Study on Interfaces to Improving Wikipedia Quality
>      (J.L.W.S. The Special One)
>   2. Re: Study on Interfaces to Improving Wikipedia Quality
>      (Gregory Maxwell)
>   3. Re: Wikipedia-l Digest, Vol 64, Issue 3 (Jocla)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 25 Nov 2008 11:59:54 +0800
> From: "J.L.W.S. The Special One" <hildanknight at gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l at lists.wikimedia.org
> Message-ID:
> <d41ac4640811241959o239db59bp1807b90877a33aa9 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> How would the system handle a paragraph full of high quality,
> well-referenced and well-organised content contributed by an editor A, 
> that
> is thoroughly copyedited by an editor B? Would editor A be deemed less
> trustworthy when his prose is thoroughly copyedited?
>
> 2008/11/25, Luca de Alfaro <luca at dealfaro.org>:
>>
>> Maury,
>>
>> perhaps I can help explain the behavior you saw in the UCSC system (I am
>> one
>> of the developers).
>> New text is always somewhat orange, to signal to visitors that it has not
>> yet been fully reviewed.
>> The higher the reputation, the lighter the shade of orange, but orange it
>> still is (I have no idea of how high was your computed reputation when 
>> you
>> started writing that article).
>>
>> Text background becomes white when other people revise it without
>> drastically changing it: this indicates consensus.
>> In our more recent code version, we also have a "vote" button; using 
>> this,
>> text can more speedily gain trust without need for many revisions to 
>> occur.
>> In a live experiment, where people can click on the vote button, I 
>> presume
>> the trust of the text would raise more rapidly.  Note that the code
>> prevents
>> double voting, or creating sock-puppet accounts to vote, etc etc.
>>
>> So I don't think based on what you say that the system is tripping over
>> diffs.  It is simply considering new text less trusted, and more revised
>> text more trusted, which is what we wanted.   It appears however we don't
>> do
>> a very good job on the web site describing the algorithm (I guess we put
>> most of the description work in writing the papers... we will try to
>> improve
>> the web site).
>>
>> We don't measure "edit work" in number of edits, but in number of words
>> changed.
>> As you say, for our system, changing 1000 words in separate edits is the
>> same (provided the edits are all kept, i.e., not reverted) as providing a
>> single 1000-word contribution.   We thought of giving a larger prize to
>> larger contributions: precisely, of making the reputation increment
>> proportional to n^a, where n is the number of words, and a > 1.  This did
>> not work well for the Wikipedia, because it ended up not rewarding enough
>> the work of the many editors, who clean and polish the articles, thus
>> making
>> many small edits.  Technically it would be trivial to change the code to
>> include such a non-linear reward scheme (to adopt rewards proportional to
>> n^a rather than n); whether it is desirable, I have no idea.  It does not
>> lead to better quantitative performance of the system, i.e., the 
>> resulting
>> trust is not better at predicting future text deletions.
>>
>>
>> Luca
>>
>>
>>
>> > The USCS system did work, but gave me odd results. Apparently I have a
>> > very bad reputation, because when I look in the History at the first
>> > versions, which I wrote in entirety, it colored it all yellow!
>> >
>> > Newer versions of the same articles had much more white, even though
>> > huge portions of the text were still from the origial. This may be due
>> > to diff problems -- I consider diff to be largely random in
>> > effectiveness, sometimes it works, but othertimes a single whitespace
>> > change, especially vertical, will make it think the entire article was
>> > edited.
>> >
>> > My guess is that the system is tripping over diffs like this, and thus
>> > considering the article to have been re-written by another editor.
>> > Since this has happened, MY reputation goes down, or so I understand
>> > it.
>> >
>> > I don?t think this system could possibly work if based on wiki's
>> > diffs. If its going to work it?s going to need to use a much more
>> > reliable system.
>> >
>> > Another problem I see with it is that it will rank an author who?s
>> > contributions are 1000 unchanged comma inserts to be as reliable as an
>> > author who created a perfect 1000 character article (or perhaps rate
>> > the first even higher). There should be some sort of length bias, if
>> > an author makes a big edit, out of character, that?s important to
>> > know.
>> >
>> > Maury
>> >
>> > _______________________________________________
>> > Wikipedia-l mailing list
>> > Wikipedia-l at lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>> >
>> _______________________________________________
>> Wikipedia-l mailing list
>> Wikipedia-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>
>
>
>
> -- 
> Written with passion,
> J.L.W.S. The Special One
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 24 Nov 2008 23:14:57 -0500
> From: "Gregory Maxwell" <gmaxwell at gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l at lists.wikimedia.org
> Message-ID:
> <e692861c0811242014h464f5a2ei31a1cb3aecfea04b at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Mon, Nov 24, 2008 at 8:35 PM, Luca de Alfaro <luca at dealfaro.org> wrote:
> [snip]
>> So I don't think based on what you say that the system is tripping over
>> diffs.
>
> For example: I can't figure out why the text in the image caption is
> colored here
> http://wiki-trust.cse.ucsc.edu/index.php/Digital_room_correction
>
> I couldn't initially figure out why *anything* above the external link
> section was colored? though the inability to diff contributed to that.
>
> On Mon, Nov 24, 2008 at 8:22 PM, Luca de Alfaro <luca at dealfaro.org> wrote:
>> I agree with Gregory that it is very useful to quantify the usefulness of
>> trust information on text -- otherwise, all comparison are very 
>> subjective.
>> In our WikiSym 08 paper, we measure various parameters of the "trust"
>> coloring we compute, including:
>>
>>   - Recall of deletions.  Only 3.4% of text is in the lower half of trust
>>   values, yet this is 66% of the text that is deleted in the very next
>>   revision.
>>   - Precision of deletions.  Text is the bottom half of trust values has
>>   probability 33% of being deleted in the next revision, agaist a 
>> probability
>>   of 1.9% for general text.  The deletion probability raises to 62% for 
>> text
>>   in the bottom 20% of trust values.
>>   - We study the correlation between the trust of a word, sampled at 
>> random
>>   in all revisions, and the future lifespan of a word (correcting for the
>>   finite horizon effect due to the finite number of revisions in each
>>   article), showing positive correlation.
> [snip]
>
> These performance metrics are better than I would have guessed from
> browsing through the output. How does the color mapping reflect the
> trust values?  Basically when I use it I see a *lot* of colored things
> which are perfectly fine. At least for me, the difference between
> shades is far less cognitively significant than colored vs
> non-colored, so that may be the source of my confusion.
>
> Have you compared your system to a simple toy trust metric?  I'd
> propose "revisions by users in their first week and before their first
> 7 (?) edits are untrusted".  This reflects the existing automatic
> trust system on the site (auto-confirmation), and also reflects the a
> type of trust checking applied manually by editors.   I think thats
> the bar any more sophisticated trust metric needs to outperform.
>
> Thank you so much for your response!
>
> ------------------------------
>
> Message: 3
> Date: Tue, 25 Nov 2008 07:59:19 -0000
> From: "Jocla" <paresdoce at gmail.com>
> Subject: Re: [Wikipedia-l] Wikipedia-l Digest, Vol 64, Issue 3
> To: <wikipedia-l at lists.wikimedia.org>
> Message-ID: <001701c94ed3$b9b78d50$7f01a8c0 at windows337902b>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> reply-type=original
>
> susceibe
> ----- Original Message ----- 
> From: <wikipedia-l-request at lists.wikimedia.org>
> To: <wikipedia-l at lists.wikimedia.org>
> Sent: Tuesday, November 25, 2008 1:35 AM
> Subject: Wikipedia-l Digest, Vol 64, Issue 3
>
>
>> Send Wikipedia-l mailing list submissions to
>> wikipedia-l at lists.wikimedia.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>> or, via email, send a message with subject or body 'help' to
>> wikipedia-l-request at lists.wikimedia.org
>>
>> You can reach the person managing the list at
>> wikipedia-l-owner at lists.wikimedia.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Wikipedia-l digest..."
>>
>>
>> Today's Topics:
>>
>>   1. suscribe (Jocla)
>>   2. Study on Interfaces to Improving Wikipedia Quality
>>      (avani at cs.umn.edu)
>>   3. Re: Study on Interfaces to Improving Wikipedia Quality
>>      (michael west)
>>   4. Re: Study on Interfaces to Improving Wikipedia Quality
>>      (Joseph Reagle)
>>   5. Re: Study on Interfaces to Improving Wikipedia Quality
>>      (Maury Markowitz)
>>   6. Re: Study on Interfaces to Improving Wikipedia Quality
>>      (Gregory Maxwell)
>>   7. Re: Study on Interfaces to Improving Wikipedia Quality
>>      (Luca de Alfaro)
>>   8. Re: Study on Interfaces to Improving Wikipedia Quality
>>      (Luca de Alfaro)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 19 Nov 2008 18:22:03 -0000
>> From: "Jocla" <paresdoce at gmail.com>
>> Subject: [Wikipedia-l] suscribe
>> To: <wikipedia-l at lists.wikimedia.org>
>> Message-ID: <001c01c94a73$ba1850e0$7f01a8c0 at windows337902b>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> thanks for your e-mail, i would like to suscribe.
>>
>> ------------------------------
>>
>> Message: 2
>> Date: 19 Nov 2008 13:23:53 -0600
>> From: avani at cs.umn.edu
>> Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID: <Prayer.1.0.18.0811191323530.7842 at sabinus.cs.umn.edu>
>> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
>>
>>
>> Dear All,
>>
>> My name is Avanidhar Chandrasekaran
>> (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
>>
>> I work with GroupLens Research at the University of Minnesota, Twin
>> Cities.
>> As part of my research, I am involved in analyzing the usefulness and
>> Necessity of author reputation in Wikipedia.
>>
>> In lieu of this, I have simulated an Interface to color words in an
>> article
>> based on their Age.
>>
>> Being experienced contributors to Wikipedia, I invite you to participate
>> in
>> this study, which involves the following.
>>
>> 1. Please visit the following Instances of wikipedia and evaluate the
>> interface components which have been incorporated into each of them. Each
>> of these use their own algorithm to color text.
>>
>> a) The Wikitrust project
>>
>>   http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
>>
>> b) The Wiki-reputation project at Grouplens research
>>
>>   http://wiki-reputation.cs.umn.edu/index.php/Main_Page
>>
>> 2) Once you have evaluated the two interfaces, kindly complete this 
>> survey
>> on Wikipedia quality
>>
>>  http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
>>
>>
>> We hope to get your valuable feedback on these interfaces and how
>> Wikipedia
>> article quality can be improved.
>>
>> Thanks for your time
>>
>> Avanidhar Chandrasekaran,
>>
>> GroupLens Research, University of Minnesota
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Wed, 19 Nov 2008 20:01:27 +0000
>> From: "michael west" <michawest at gmail.com>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID:
>> <cfe6de600811191201h727fb4e4s9660f64f2815c93f at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> 2008/11/19 <avani at cs.umn.edu>
>>
>>>
>>> Dear All,
>>>
>>> My name is Avanidhar Chandrasekaran
>>> (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
>>>
>>> I work with GroupLens Research at the University of Minnesota, Twin
>>> Cities.
>>> As part of my research, I am involved in analyzing the usefulness and
>>> Necessity of author reputation in Wikipedia.
>>>
>>> In lieu of this, I have simulated an Interface to color words in an
>>> article
>>> based on their Age.
>>>
>>> Being experienced contributors to Wikipedia, I invite you to participate
>>> in
>>> this study, which involves the following.
>>>
>>> 1. Please visit the following Instances of wikipedia and evaluate the
>>> interface components which have been incorporated into each of them. 
>>> Each
>>> of these use their own algorithm to color text.
>>>
>>> a) The Wikitrust project
>>>
>>>   http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
>>>
>>> b) The Wiki-reputation project at Grouplens research
>>>
>>>   http://wiki-reputation.cs.umn.edu/index.php/Main_Page
>>>
>>> 2) Once you have evaluated the two interfaces, kindly complete this
>>> survey
>>> on Wikipedia quality
>>>
>>>  http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
>>>
>>>
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>>
>>> Thanks for your time
>>>
>>> Avanidhar Chandrasekaran,
>>>
>>> GroupLens Research, University of Minnesota
>>>
>>
>> Quite interesting - the "age of words" color coding might be useful in
>> detecting obtuse type vandalism.
>>
>> m
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Wed, 19 Nov 2008 17:40:23 -0500
>> From: Joseph Reagle <reagle at mit.edu>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID: <200811191740.23471.reagle at mit.edu>
>> Content-Type: text/plain;  charset="iso-8859-1"
>>
>> On Wednesday 19 November 2008, avani at cs.umn.edu wrote:
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>
>> This might bias other respondants, but I thought it was an intersting 
>> idea
>> so I wanted to share it. I concluded with the following which is no doubt
>> affected by my being a WikiGnome:
>>
>> [[
>> If I see an error, I fix it without much regard to time or author
>> reputation. I do pay attention to and investigate author reputation on
>> substantive issues on the discussion pages and it would be interesting to
>> see a discussion thread colored according to reputation.
>> ]]
>>
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Sun, 23 Nov 2008 09:03:25 -0500
>> From: "Maury Markowitz" <maury.markowitz at gmail.com>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID:
>> <5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On Wed, Nov 19, 2008 at 2:23 PM,  <avani at cs.umn.edu> wrote:
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>
>> Given the older snapshots, I selected older articles that I had
>> started, NuBUS and ARCNET.
>>
>> The "time based" system from UMN did not work at all, every search
>> resulted in a page not found.
>>
>> The USCS system did work, but gave me odd results. Apparently I have a
>> very bad reputation, because when I look in the History at the first
>> versions, which I wrote in entirety, it colored it all yellow!
>>
>> Newer versions of the same articles had much more white, even though
>> huge portions of the text were still from the origial. This may be due
>> to diff problems -- I consider diff to be largely random in
>> effectiveness, sometimes it works, but othertimes a single whitespace
>> change, especially vertical, will make it think the entire article was
>> edited.
>>
>> My guess is that the system is tripping over diffs like this, and thus
>> considering the article to have been re-written by another editor.
>> Since this has happened, MY reputation goes down, or so I understand
>> it.
>>
>> I don?t think this system could possibly work if based on wiki's
>> diffs. If its going to work it?s going to need to use a much more
>> reliable system.
>>
>> Another problem I see with it is that it will rank an author who?s
>> contributions are 1000 unchanged comma inserts to be as reliable as an
>> author who created a perfect 1000 character article (or perhaps rate
>> the first even higher). There should be some sort of length bias, if
>> an author makes a big edit, out of character, that?s important to
>> know.
>>
>> Maury
>>
>>
>>
>> ------------------------------
>>
>> Message: 6
>> Date: Sun, 23 Nov 2008 09:44:40 -0500
>> From: "Gregory Maxwell" <gmaxwell at gmail.com>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID:
>> <e692861c0811230644i316f94abg6cafe7ef87f6bc3b at mail.gmail.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz
>> <maury.markowitz at gmail.com> wrote:
>>> On Wed, Nov 19, 2008 at 2:23 PM,  <avani at cs.umn.edu> wrote:
>>>> We hope to get your valuable feedback on these interfaces and how
>>>> Wikipedia
>>>> article quality can be improved.
>>>
>>> Given the older snapshots, I selected older articles that I had
>>> started, NuBUS and ARCNET.
>>>
>>> The "time based" system from UMN did not work at all, every search
>>> resulted in a page not found.
>>
>> The UMN system intentionally included only a small number (70?)
>> articles. This is why you needed to use the random page function to
>> browse among them.
>>
>> This doesn't reflect any short coming of the system, but it most
>> likely just reflects the limits of computational resources they were
>> working under.
>>
>> [snip]
>>> Newer versions of the same articles had much more white, even though
>>> huge portions of the text were still from the origial. This may be due
>>> to diff problems -- I consider diff to be largely random in
>>> effectiveness, sometimes it works, but othertimes a single whitespace
>>> change, especially vertical, will make it think the entire article was
>>> edited.
>>
>> Yes, I had exactly the same experience with the USCS system: Different
>> coloring for text I'd added in same edit which created the article.
>> Quite inscrutable.
>>
>> [snip]
>>> Another problem I see with it is that it will rank an author who?s
>>> contributions are 1000 unchanged comma inserts to be as reliable as an
>>> author who created a perfect 1000 character article (or perhaps rate
>>> the first even higher). There should be some sort of length bias, if
>>> an author makes a big edit, out of character, that?s important to
>>> know.
>>
>> For the articles it covered I found the UMN system to be more usable:
>> It's output was more explicable, and the signal to noise ratio was
>> just better.  This may be partially due to bugs in the USCS history
>> analysis, and different a different choice in coloring thresholds
>> (USCS seemed to color almost everything, removing the usefulness of
>> color as something to draw my attention).
>>
>> Even so, I'm distrustful of "reputation" as an automated metric.
>> Reputation is a fuzzy thing (consider your comma example), but time is
>> just a straight forward metric which is much easier to get right. Your
>> tireless and unreverted editing of external links tells me very little
>> about your ability to make a reliable edit to the intro of an article,
>> ... or at least very little that I didn't already know by merely
>> knowing if your account was brand new or not. (New accounts are more
>> likely to be used by inexperienced and ill-motivated persons)
>>
>> I believe a metric applied correctly, consistently, and understandably
>> is just going to be more useful than a metric which considers more
>> data but is also subject to more noise. The differential performance
>> between these two systems has done nothing but confirm my suspicions
>> in this regard.
>>
>> A simply objective challenge for any predictive coloring system would
>> be to use them in the following experimental procedure:
>>
>> * Take a dump of Wikipedia up a year old, use this as the underlying
>> knowledge for the systems.
>> * Make several random selections of articles and include the newer
>> revisions not included in the initial set up to 6 months old. Call
>> these the test sets.
>> * The predictive coloring system should then take each revision in a
>> test set in time order and predict if it will be reverted (Within X
>> time?).
>> * The actual edits up to now should be analyzed to determined which
>> changes actually were reverted and when.
>>
>> The final score will be the false positive and false negative rates.
>> So long as e assume that the existing editing practices are not too
>> bad we should find that the best predictive coloring system would
>> generally tend to minimize these rates.
>>
>> ------------------------------
>>
>> Message: 7
>> Date: Mon, 24 Nov 2008 17:22:23 -0800
>> From: "Luca de Alfaro" <luca at dealfaro.org>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID:
>> <28fa90930811241722y25c26bf1i6441b489e3ff6285 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> I agree with Gregory that it is very useful to quantify the usefulness of
>> trust information on text -- otherwise, all comparison are very
>> subjective.
>> In our WikiSym 08 paper, we measure various parameters of the "trust"
>> coloring we compute, including:
>>
>>   - Recall of deletions.  Only 3.4% of text is in the lower half of trust
>>   values, yet this is 66% of the text that is deleted in the very next
>>   revision.
>>   - Precision of deletions.  Text is the bottom half of trust values has
>>   probability 33% of being deleted in the next revision, agaist a
>> probability
>>   of 1.9% for general text.  The deletion probability raises to 62% for
>> text
>>   in the bottom 20% of trust values.
>>   - We study the correlation between the trust of a word, sampled at
>> random
>>   in all revisions, and the future lifespan of a word (correcting for the
>>   finite horizon effect due to the finite number of revisions in each
>>   article), showing positive correlation.
>>
>> Some aspects are not captured by the above measures:
>>
>>   - We ensured that every "tampering" (including cut-and-paste) are
>>   reflected in the trust coloring, so it is hard to subvert the algorithm
>>   (does "age" provide this?).
>>   - We ensured the whole scheme is robust wrt attacks (see the various
>>   papers if you are interested).
>>
>> I fully believe that it should not be hard to improve on our system re.
>> the
>> above measurements.  And I fully agree that the "reputation" we compute 
>> is
>> essentially an internal parameter of the system, and does not really
>> constitute a good summary of a person's overall Wikipedia contribution;
>> for
>> this and other reasons we do not display it.
>>
>> Luca
>>
>> A simply objective challenge for any predictive coloring system would
>>> be to use them in the following experimental procedure:
>>>
>>> * Take a dump of Wikipedia up a year old, use this as the underlying
>>> knowledge for the systems.
>>> * Make several random selections of articles and include the newer
>>> revisions not included in the initial set up to 6 months old. Call
>>> these the test sets.
>>> * The predictive coloring system should then take each revision in a
>>> test set in time order and predict if it will be reverted (Within X
>>> time?).
>>> * The actual edits up to now should be analyzed to determined which
>>> changes actually were reverted and when.
>>>
>>> The final score will be the false positive and false negative rates.
>>> So long as e assume that the existing editing practices are not too
>>> bad we should find that the best predictive coloring system would
>>> generally tend to minimize these rates.
>>> _______________________________________________
>>> Wikipedia-l mailing list
>>> Wikipedia-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>>
>>
>>
>> ------------------------------
>>
>> Message: 8
>> Date: Mon, 24 Nov 2008 17:35:13 -0800
>> From: "Luca de Alfaro" <luca at dealfaro.org>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l at lists.wikimedia.org
>> Message-ID:
>> <28fa90930811241735l235af9cag554632448d80ef7 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Maury,
>>
>> perhaps I can help explain the behavior you saw in the UCSC system (I am
>> one
>> of the developers).
>> New text is always somewhat orange, to signal to visitors that it has not
>> yet been fully reviewed.
>> The higher the reputation, the lighter the shade of orange, but orange it
>> still is (I have no idea of how high was your computed reputation when 
>> you
>> started writing that article).
>>
>> Text background becomes white when other people revise it without
>> drastically changing it: this indicates consensus.
>> In our more recent code version, we also have a "vote" button; using 
>> this,
>> text can more speedily gain trust without need for many revisions to
>> occur.
>> In a live experiment, where people can click on the vote button, I 
>> presume
>> the trust of the text would raise more rapidly.  Note that the code
>> prevents
>> double voting, or creating sock-puppet accounts to vote, etc etc.
>>
>> So I don't think based on what you say that the system is tripping over
>> diffs.  It is simply considering new text less trusted, and more revised
>> text more trusted, which is what we wanted.   It appears however we don't
>> do
>> a very good job on the web site describing the algorithm (I guess we put
>> most of the description work in writing the papers... we will try to
>> improve
>> the web site).
>>
>> We don't measure "edit work" in number of edits, but in number of words
>> changed.
>> As you say, for our system, changing 1000 words in separate edits is the
>> same (provided the edits are all kept, i.e., not reverted) as providing a
>> single 1000-word contribution.   We thought of giving a larger prize to
>> larger contributions: precisely, of making the reputation increment
>> proportional to n^a, where n is the number of words, and a > 1.  This did
>> not work well for the Wikipedia, because it ended up not rewarding enough
>> the work of the many editors, who clean and polish the articles, thus
>> making
>> many small edits.  Technically it would be trivial to change the code to
>> include such a non-linear reward scheme (to adopt rewards proportional to
>> n^a rather than n); whether it is desirable, I have no idea.  It does not
>> lead to better quantitative performance of the system, i.e., the 
>> resulting
>> trust is not better at predicting future text deletions.
>>
>> Luca
>>
>>
>>> The USCS system did work, but gave me odd results. Apparently I have a
>>> very bad reputation, because when I look in the History at the first
>>> versions, which I wrote in entirety, it colored it all yellow!
>>>
>>> Newer versions of the same articles had much more white, even though
>>> huge portions of the text were still from the origial. This may be due
>>> to diff problems -- I consider diff to be largely random in
>>> effectiveness, sometimes it works, but othertimes a single whitespace
>>> change, especially vertical, will make it think the entire article was
>>> edited.
>>>
>>> My guess is that the system is tripping over diffs like this, and thus
>>> considering the article to have been re-written by another editor.
>>> Since this has happened, MY reputation goes down, or so I understand
>>> it.
>>>
>>> I don?t think this system could possibly work if based on wiki's
>>> diffs. If its going to work it?s going to need to use a much more
>>> reliable system.
>>>
>>> Another problem I see with it is that it will rank an author who?s
>>> contributions are 1000 unchanged comma inserts to be as reliable as an
>>> author who created a perfect 1000 character article (or perhaps rate
>>> the first even higher). There should be some sort of length bias, if
>>> an author makes a big edit, out of character, that?s important to
>>> know.
>>>
>>> Maury
>>>
>>> _______________________________________________
>>> Wikipedia-l mailing list
>>> Wikipedia-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Wikipedia-l mailing list
>> Wikipedia-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>
>>
>> End of Wikipedia-l Digest, Vol 64, Issue 3
>> ******************************************
>>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>
>
> End of Wikipedia-l Digest, Vol 64, Issue 4
> ******************************************