I was describing to someone how Wikipedia works:
"anyone can edit" etc.
He answered with this argument:
"Wikipedia is the triumph of the average person!
of the man in the street!)"
(average meaning: not good, not bad, just OK)
I asked "why?"
His explanation:
"Great brilliant works are built by individuals.
Groups of people can only create average works.
If someone writes something good in the wiki,
other average persons will intervene with his/her
work and turn it into an average work. If someone
writes something bad in the wiki, the others will
again turn it into something of average value.
with your system (meaning: Wikipedia's system)
you can be sure that you will never create
something too bad but also never something too
good. You can create only average articles."
The idea behind his argument was that Wikipedia
will be a good resource as long as it attracts
good cotnributors. but it will soon become an
average site/encyclopaedia because it allows
anyone to join the project and edit, and most
people are just average persons and not brilliant
writers.
Do you think it's true? and how can we answer
this argument?
--Optim
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!
http://webhosting.yahoo.com/ps/sb/
On Sunday 28 July 2002 03:00 am, The Cunctator wrote:
> What are the articles this person has been changing?
For 66.108.155.126:
20:08 Jul 27, 2002 Computer
20:07 Jul 27, 2002 Exploit
20:07 Jul 27, 2002 AOL
20:05 Jul 27, 2002 Hacker
20:05 Jul 27, 2002 Leet
20:03 Jul 27, 2002 Root
20:02 Jul 27, 2002 Hacker
19:59 Jul 27, 2002 Hacker
19:58 Jul 27, 2002 Hacker
19:54 Jul 27, 2002 Principle of least astonishment
19:54 Jul 27, 2002 Hacker
19:52 Jul 27, 2002 Trance music
19:51 Jul 27, 2002 Trance music
For 208.24.115.6:
20:20 Jul 27, 2002 Hacker
For 141.157.232.26:
20:19 Jul 27, 2002 Hacker
Most of these were complete replacements with discoherent statements.
Such as "TAP IS THE ABSOLUTE DEFINITION OF THE NOUN HACKER" for Hacker.
For the specifics follow http://www.wikipedia.com/wiki/Special:Ipblocklist
and look at the contribs.
--mav
Dear friend,
We are conducting a study on the motivation of the knowledge sharing on the
Wikipedia community.
The contributors’ experience to Linux is very important to the design and
management of this knowledge platform.
Would you please post the following on-line questionnaire message to the
Wikipedia platform or forward the message to the members?
After the survey is done, we will randomly select twenty persons and present
them with USB 2GB Flash Drives.
Besides, with each valid questionnaire, we will donate US $1 dollar to the
Wikimedia Foundation.
The result of this survey is analyzed in an anonymous way and is only
regarded as the academic use.
Please help us to complete the data collection.
Thanks so much for your help.
Cheers,
Joanne
[The Message content]
Dear friends,
We are conducting a study on the motivation of the knowledge sharing on
Wikipedia. Your experience of the read from and write to Wikipedia is very
important to the design and management of this knowledge platform. The
survey will take about two minutes. We deeply appreciate your help on
answering the following questions.
After the survey is done, we will randomly select twenty persons and
present them with USB 2GB Flash Drives. Besides, with each valid
questionnaire, we will donate US $1 dollar to the Wikimedia Foundation. The
result of this survey is analyzed in an anonymous way and is only regarded
as the academic use. Please feel free to fill out the questionnaire. Thanks
again for your time and valuable input.
May happiness and health be with you everyday!
★ On-line Questionnaire: http://140.119.19.152:8080/wiki/
 
Shari S. C. Shang
Eldon Y. Li
Professor,
Department of Management Information Systems,
National Chengchi University
Tel.: +886-2-82374038ï¼› Fax: +886-2-29393754 ï¼› E-mail: s1213527(a)yahoo.com.
tw
Dear friend,
We are conducting a study on the motivation of the knowledge sharing on the
Wikipedia community.
The contributors’ experience to Linux is very important to the design and
management of this knowledge platform.
Would you please post the following on-line questionnaire message to the
Wikipedia platform or forward the message to the members?
After the survey is done, we will randomly select twenty persons and present
them with USB 2GB Flash Drives.
Besides, with each valid questionnaire, we will donate US $1 dollar to the
Wikimedia Foundation.
The result of this survey is analyzed in an anonymous way and is only
regarded as the academic use.
Please help us to complete the data collection.
Thanks so much for your help.
Cheers,
Joanne
[The Message content]
Dear friends,
We are conducting a study on the motivation of the knowledge sharing on
Wikipedia. Your experience of the read from and write to Wikipedia is very
important to the design and management of this knowledge platform. The
survey will take about two minutes. We deeply appreciate your help on
answering the following questions.
After the survey is done, we will randomly select twenty persons and
present them with USB 2GB Flash Drives. Besides, with each valid
questionnaire, we will donate US $1 dollar to the Wikimedia Foundation. The
result of this survey is analyzed in an anonymous way and is only regarded
as the academic use. Please feel free to fill out the questionnaire. Thanks
again for your time and valuable input.
May happiness and health be with you everyday!
★ On-line Questionnaire: http://140.119.19.152:8080/wiki/
 
Shari S. C. Shang
Eldon Y. Li
Professor,
Department of Management Information Systems,
National Chengchi University
Tel.: +886-2-82374038ï¼› Fax: +886-2-29393754 ï¼› E-mail: s1213527(a)yahoo.com.
tw
Dear All,
My name is Avanidhar Chandrasekaran
(http://en.wikipedia.org/wiki/User_talk:Avanidhar).
I work with GroupLens Research at the University of Minnesota, Twin Cities.
As part of my research, I am involved in analyzing the usefulness and
Necessity of author reputation in Wikipedia.
In lieu of this, I have simulated an Interface to color words in an article
based on their Age.
Being experienced contributors to Wikipedia, I invite you to participate in
this study, which involves the following.
1. Please visit the following Instances of wikipedia and evaluate the
interface components which have been incorporated into each of them. Each
of these use their own algorithm to color text.
a) The Wikitrust project
http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
b) The Wiki-reputation project at Grouplens research
http://wiki-reputation.cs.umn.edu/index.php/Main_Page
2) Once you have evaluated the two interfaces, kindly complete this survey
on Wikipedia quality
http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
We hope to get your valuable feedback on these interfaces and how Wikipedia
article quality can be improved.
Thanks for your time
Avanidhar Chandrasekaran,
GroupLens Research, University of Minnesota
----- Original Message -----
From: <wikipedia-l-request(a)lists.wikimedia.org>
To: <wikipedia-l(a)lists.wikimedia.org>
Sent: Tuesday, November 25, 2008 7:59 AM
Subject: Wikipedia-l Digest, Vol 64, Issue 4
> Send Wikipedia-l mailing list submissions to
> wikipedia-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
> or, via email, send a message with subject or body 'help' to
> wikipedia-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikipedia-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikipedia-l digest..."
>
>
> Today's Topics:
>
> 1. Re: Study on Interfaces to Improving Wikipedia Quality
> (J.L.W.S. The Special One)
> 2. Re: Study on Interfaces to Improving Wikipedia Quality
> (Gregory Maxwell)
> 3. Re: Wikipedia-l Digest, Vol 64, Issue 3 (Jocla)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 25 Nov 2008 11:59:54 +0800
> From: "J.L.W.S. The Special One" <hildanknight(a)gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <d41ac4640811241959o239db59bp1807b90877a33aa9(a)mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> How would the system handle a paragraph full of high quality,
> well-referenced and well-organised content contributed by an editor A,
> that
> is thoroughly copyedited by an editor B? Would editor A be deemed less
> trustworthy when his prose is thoroughly copyedited?
>
> 2008/11/25, Luca de Alfaro <luca(a)dealfaro.org>:
>>
>> Maury,
>>
>> perhaps I can help explain the behavior you saw in the UCSC system (I am
>> one
>> of the developers).
>> New text is always somewhat orange, to signal to visitors that it has not
>> yet been fully reviewed.
>> The higher the reputation, the lighter the shade of orange, but orange it
>> still is (I have no idea of how high was your computed reputation when
>> you
>> started writing that article).
>>
>> Text background becomes white when other people revise it without
>> drastically changing it: this indicates consensus.
>> In our more recent code version, we also have a "vote" button; using
>> this,
>> text can more speedily gain trust without need for many revisions to
>> occur.
>> In a live experiment, where people can click on the vote button, I
>> presume
>> the trust of the text would raise more rapidly. Note that the code
>> prevents
>> double voting, or creating sock-puppet accounts to vote, etc etc.
>>
>> So I don't think based on what you say that the system is tripping over
>> diffs. It is simply considering new text less trusted, and more revised
>> text more trusted, which is what we wanted. It appears however we don't
>> do
>> a very good job on the web site describing the algorithm (I guess we put
>> most of the description work in writing the papers... we will try to
>> improve
>> the web site).
>>
>> We don't measure "edit work" in number of edits, but in number of words
>> changed.
>> As you say, for our system, changing 1000 words in separate edits is the
>> same (provided the edits are all kept, i.e., not reverted) as providing a
>> single 1000-word contribution. We thought of giving a larger prize to
>> larger contributions: precisely, of making the reputation increment
>> proportional to n^a, where n is the number of words, and a > 1. This did
>> not work well for the Wikipedia, because it ended up not rewarding enough
>> the work of the many editors, who clean and polish the articles, thus
>> making
>> many small edits. Technically it would be trivial to change the code to
>> include such a non-linear reward scheme (to adopt rewards proportional to
>> n^a rather than n); whether it is desirable, I have no idea. It does not
>> lead to better quantitative performance of the system, i.e., the
>> resulting
>> trust is not better at predicting future text deletions.
>>
>>
>> Luca
>>
>>
>>
>> > The USCS system did work, but gave me odd results. Apparently I have a
>> > very bad reputation, because when I look in the History at the first
>> > versions, which I wrote in entirety, it colored it all yellow!
>> >
>> > Newer versions of the same articles had much more white, even though
>> > huge portions of the text were still from the origial. This may be due
>> > to diff problems -- I consider diff to be largely random in
>> > effectiveness, sometimes it works, but othertimes a single whitespace
>> > change, especially vertical, will make it think the entire article was
>> > edited.
>> >
>> > My guess is that the system is tripping over diffs like this, and thus
>> > considering the article to have been re-written by another editor.
>> > Since this has happened, MY reputation goes down, or so I understand
>> > it.
>> >
>> > I don?t think this system could possibly work if based on wiki's
>> > diffs. If its going to work it?s going to need to use a much more
>> > reliable system.
>> >
>> > Another problem I see with it is that it will rank an author who?s
>> > contributions are 1000 unchanged comma inserts to be as reliable as an
>> > author who created a perfect 1000 character article (or perhaps rate
>> > the first even higher). There should be some sort of length bias, if
>> > an author makes a big edit, out of character, that?s important to
>> > know.
>> >
>> > Maury
>> >
>> > _______________________________________________
>> > Wikipedia-l mailing list
>> > Wikipedia-l(a)lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>> >
>> _______________________________________________
>> Wikipedia-l mailing list
>> Wikipedia-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>
>
>
>
> --
> Written with passion,
> J.L.W.S. The Special One
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 24 Nov 2008 23:14:57 -0500
> From: "Gregory Maxwell" <gmaxwell(a)gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <e692861c0811242014h464f5a2ei31a1cb3aecfea04b(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Mon, Nov 24, 2008 at 8:35 PM, Luca de Alfaro <luca(a)dealfaro.org> wrote:
> [snip]
>> So I don't think based on what you say that the system is tripping over
>> diffs.
>
> For example: I can't figure out why the text in the image caption is
> colored here
> http://wiki-trust.cse.ucsc.edu/index.php/Digital_room_correction
>
> I couldn't initially figure out why *anything* above the external link
> section was colored? though the inability to diff contributed to that.
>
> On Mon, Nov 24, 2008 at 8:22 PM, Luca de Alfaro <luca(a)dealfaro.org> wrote:
>> I agree with Gregory that it is very useful to quantify the usefulness of
>> trust information on text -- otherwise, all comparison are very
>> subjective.
>> In our WikiSym 08 paper, we measure various parameters of the "trust"
>> coloring we compute, including:
>>
>> - Recall of deletions. Only 3.4% of text is in the lower half of trust
>> values, yet this is 66% of the text that is deleted in the very next
>> revision.
>> - Precision of deletions. Text is the bottom half of trust values has
>> probability 33% of being deleted in the next revision, agaist a
>> probability
>> of 1.9% for general text. The deletion probability raises to 62% for
>> text
>> in the bottom 20% of trust values.
>> - We study the correlation between the trust of a word, sampled at
>> random
>> in all revisions, and the future lifespan of a word (correcting for the
>> finite horizon effect due to the finite number of revisions in each
>> article), showing positive correlation.
> [snip]
>
> These performance metrics are better than I would have guessed from
> browsing through the output. How does the color mapping reflect the
> trust values? Basically when I use it I see a *lot* of colored things
> which are perfectly fine. At least for me, the difference between
> shades is far less cognitively significant than colored vs
> non-colored, so that may be the source of my confusion.
>
> Have you compared your system to a simple toy trust metric? I'd
> propose "revisions by users in their first week and before their first
> 7 (?) edits are untrusted". This reflects the existing automatic
> trust system on the site (auto-confirmation), and also reflects the a
> type of trust checking applied manually by editors. I think thats
> the bar any more sophisticated trust metric needs to outperform.
>
> Thank you so much for your response!
>
> ------------------------------
>
> Message: 3
> Date: Tue, 25 Nov 2008 07:59:19 -0000
> From: "Jocla" <paresdoce(a)gmail.com>
> Subject: Re: [Wikipedia-l] Wikipedia-l Digest, Vol 64, Issue 3
> To: <wikipedia-l(a)lists.wikimedia.org>
> Message-ID: <001701c94ed3$b9b78d50$7f01a8c0@windows337902b>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> reply-type=original
>
> susceibe
> ----- Original Message -----
> From: <wikipedia-l-request(a)lists.wikimedia.org>
> To: <wikipedia-l(a)lists.wikimedia.org>
> Sent: Tuesday, November 25, 2008 1:35 AM
> Subject: Wikipedia-l Digest, Vol 64, Issue 3
>
>
>> Send Wikipedia-l mailing list submissions to
>> wikipedia-l(a)lists.wikimedia.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>> or, via email, send a message with subject or body 'help' to
>> wikipedia-l-request(a)lists.wikimedia.org
>>
>> You can reach the person managing the list at
>> wikipedia-l-owner(a)lists.wikimedia.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Wikipedia-l digest..."
>>
>>
>> Today's Topics:
>>
>> 1. suscribe (Jocla)
>> 2. Study on Interfaces to Improving Wikipedia Quality
>> (avani(a)cs.umn.edu)
>> 3. Re: Study on Interfaces to Improving Wikipedia Quality
>> (michael west)
>> 4. Re: Study on Interfaces to Improving Wikipedia Quality
>> (Joseph Reagle)
>> 5. Re: Study on Interfaces to Improving Wikipedia Quality
>> (Maury Markowitz)
>> 6. Re: Study on Interfaces to Improving Wikipedia Quality
>> (Gregory Maxwell)
>> 7. Re: Study on Interfaces to Improving Wikipedia Quality
>> (Luca de Alfaro)
>> 8. Re: Study on Interfaces to Improving Wikipedia Quality
>> (Luca de Alfaro)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 19 Nov 2008 18:22:03 -0000
>> From: "Jocla" <paresdoce(a)gmail.com>
>> Subject: [Wikipedia-l] suscribe
>> To: <wikipedia-l(a)lists.wikimedia.org>
>> Message-ID: <001c01c94a73$ba1850e0$7f01a8c0@windows337902b>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> thanks for your e-mail, i would like to suscribe.
>>
>> ------------------------------
>>
>> Message: 2
>> Date: 19 Nov 2008 13:23:53 -0600
>> From: avani(a)cs.umn.edu
>> Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID: <Prayer.1.0.18.0811191323530.7842(a)sabinus.cs.umn.edu>
>> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
>>
>>
>> Dear All,
>>
>> My name is Avanidhar Chandrasekaran
>> (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
>>
>> I work with GroupLens Research at the University of Minnesota, Twin
>> Cities.
>> As part of my research, I am involved in analyzing the usefulness and
>> Necessity of author reputation in Wikipedia.
>>
>> In lieu of this, I have simulated an Interface to color words in an
>> article
>> based on their Age.
>>
>> Being experienced contributors to Wikipedia, I invite you to participate
>> in
>> this study, which involves the following.
>>
>> 1. Please visit the following Instances of wikipedia and evaluate the
>> interface components which have been incorporated into each of them. Each
>> of these use their own algorithm to color text.
>>
>> a) The Wikitrust project
>>
>> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
>>
>> b) The Wiki-reputation project at Grouplens research
>>
>> http://wiki-reputation.cs.umn.edu/index.php/Main_Page
>>
>> 2) Once you have evaluated the two interfaces, kindly complete this
>> survey
>> on Wikipedia quality
>>
>> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
>>
>>
>> We hope to get your valuable feedback on these interfaces and how
>> Wikipedia
>> article quality can be improved.
>>
>> Thanks for your time
>>
>> Avanidhar Chandrasekaran,
>>
>> GroupLens Research, University of Minnesota
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Wed, 19 Nov 2008 20:01:27 +0000
>> From: "michael west" <michawest(a)gmail.com>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID:
>> <cfe6de600811191201h727fb4e4s9660f64f2815c93f(a)mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> 2008/11/19 <avani(a)cs.umn.edu>
>>
>>>
>>> Dear All,
>>>
>>> My name is Avanidhar Chandrasekaran
>>> (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
>>>
>>> I work with GroupLens Research at the University of Minnesota, Twin
>>> Cities.
>>> As part of my research, I am involved in analyzing the usefulness and
>>> Necessity of author reputation in Wikipedia.
>>>
>>> In lieu of this, I have simulated an Interface to color words in an
>>> article
>>> based on their Age.
>>>
>>> Being experienced contributors to Wikipedia, I invite you to participate
>>> in
>>> this study, which involves the following.
>>>
>>> 1. Please visit the following Instances of wikipedia and evaluate the
>>> interface components which have been incorporated into each of them.
>>> Each
>>> of these use their own algorithm to color text.
>>>
>>> a) The Wikitrust project
>>>
>>> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
>>>
>>> b) The Wiki-reputation project at Grouplens research
>>>
>>> http://wiki-reputation.cs.umn.edu/index.php/Main_Page
>>>
>>> 2) Once you have evaluated the two interfaces, kindly complete this
>>> survey
>>> on Wikipedia quality
>>>
>>> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
>>>
>>>
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>>
>>> Thanks for your time
>>>
>>> Avanidhar Chandrasekaran,
>>>
>>> GroupLens Research, University of Minnesota
>>>
>>
>> Quite interesting - the "age of words" color coding might be useful in
>> detecting obtuse type vandalism.
>>
>> m
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Wed, 19 Nov 2008 17:40:23 -0500
>> From: Joseph Reagle <reagle(a)mit.edu>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID: <200811191740.23471.reagle(a)mit.edu>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> On Wednesday 19 November 2008, avani(a)cs.umn.edu wrote:
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>
>> This might bias other respondants, but I thought it was an intersting
>> idea
>> so I wanted to share it. I concluded with the following which is no doubt
>> affected by my being a WikiGnome:
>>
>> [[
>> If I see an error, I fix it without much regard to time or author
>> reputation. I do pay attention to and investigate author reputation on
>> substantive issues on the discussion pages and it would be interesting to
>> see a discussion thread colored according to reputation.
>> ]]
>>
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Sun, 23 Nov 2008 09:03:25 -0500
>> From: "Maury Markowitz" <maury.markowitz(a)gmail.com>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID:
>> <5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0(a)mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote:
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>
>> Given the older snapshots, I selected older articles that I had
>> started, NuBUS and ARCNET.
>>
>> The "time based" system from UMN did not work at all, every search
>> resulted in a page not found.
>>
>> The USCS system did work, but gave me odd results. Apparently I have a
>> very bad reputation, because when I look in the History at the first
>> versions, which I wrote in entirety, it colored it all yellow!
>>
>> Newer versions of the same articles had much more white, even though
>> huge portions of the text were still from the origial. This may be due
>> to diff problems -- I consider diff to be largely random in
>> effectiveness, sometimes it works, but othertimes a single whitespace
>> change, especially vertical, will make it think the entire article was
>> edited.
>>
>> My guess is that the system is tripping over diffs like this, and thus
>> considering the article to have been re-written by another editor.
>> Since this has happened, MY reputation goes down, or so I understand
>> it.
>>
>> I don?t think this system could possibly work if based on wiki's
>> diffs. If its going to work it?s going to need to use a much more
>> reliable system.
>>
>> Another problem I see with it is that it will rank an author who?s
>> contributions are 1000 unchanged comma inserts to be as reliable as an
>> author who created a perfect 1000 character article (or perhaps rate
>> the first even higher). There should be some sort of length bias, if
>> an author makes a big edit, out of character, that?s important to
>> know.
>>
>> Maury
>>
>>
>>
>> ------------------------------
>>
>> Message: 6
>> Date: Sun, 23 Nov 2008 09:44:40 -0500
>> From: "Gregory Maxwell" <gmaxwell(a)gmail.com>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID:
>> <e692861c0811230644i316f94abg6cafe7ef87f6bc3b(a)mail.gmail.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz
>> <maury.markowitz(a)gmail.com> wrote:
>>> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote:
>>>> We hope to get your valuable feedback on these interfaces and how
>>>> Wikipedia
>>>> article quality can be improved.
>>>
>>> Given the older snapshots, I selected older articles that I had
>>> started, NuBUS and ARCNET.
>>>
>>> The "time based" system from UMN did not work at all, every search
>>> resulted in a page not found.
>>
>> The UMN system intentionally included only a small number (70?)
>> articles. This is why you needed to use the random page function to
>> browse among them.
>>
>> This doesn't reflect any short coming of the system, but it most
>> likely just reflects the limits of computational resources they were
>> working under.
>>
>> [snip]
>>> Newer versions of the same articles had much more white, even though
>>> huge portions of the text were still from the origial. This may be due
>>> to diff problems -- I consider diff to be largely random in
>>> effectiveness, sometimes it works, but othertimes a single whitespace
>>> change, especially vertical, will make it think the entire article was
>>> edited.
>>
>> Yes, I had exactly the same experience with the USCS system: Different
>> coloring for text I'd added in same edit which created the article.
>> Quite inscrutable.
>>
>> [snip]
>>> Another problem I see with it is that it will rank an author who?s
>>> contributions are 1000 unchanged comma inserts to be as reliable as an
>>> author who created a perfect 1000 character article (or perhaps rate
>>> the first even higher). There should be some sort of length bias, if
>>> an author makes a big edit, out of character, that?s important to
>>> know.
>>
>> For the articles it covered I found the UMN system to be more usable:
>> It's output was more explicable, and the signal to noise ratio was
>> just better. This may be partially due to bugs in the USCS history
>> analysis, and different a different choice in coloring thresholds
>> (USCS seemed to color almost everything, removing the usefulness of
>> color as something to draw my attention).
>>
>> Even so, I'm distrustful of "reputation" as an automated metric.
>> Reputation is a fuzzy thing (consider your comma example), but time is
>> just a straight forward metric which is much easier to get right. Your
>> tireless and unreverted editing of external links tells me very little
>> about your ability to make a reliable edit to the intro of an article,
>> ... or at least very little that I didn't already know by merely
>> knowing if your account was brand new or not. (New accounts are more
>> likely to be used by inexperienced and ill-motivated persons)
>>
>> I believe a metric applied correctly, consistently, and understandably
>> is just going to be more useful than a metric which considers more
>> data but is also subject to more noise. The differential performance
>> between these two systems has done nothing but confirm my suspicions
>> in this regard.
>>
>> A simply objective challenge for any predictive coloring system would
>> be to use them in the following experimental procedure:
>>
>> * Take a dump of Wikipedia up a year old, use this as the underlying
>> knowledge for the systems.
>> * Make several random selections of articles and include the newer
>> revisions not included in the initial set up to 6 months old. Call
>> these the test sets.
>> * The predictive coloring system should then take each revision in a
>> test set in time order and predict if it will be reverted (Within X
>> time?).
>> * The actual edits up to now should be analyzed to determined which
>> changes actually were reverted and when.
>>
>> The final score will be the false positive and false negative rates.
>> So long as e assume that the existing editing practices are not too
>> bad we should find that the best predictive coloring system would
>> generally tend to minimize these rates.
>>
>> ------------------------------
>>
>> Message: 7
>> Date: Mon, 24 Nov 2008 17:22:23 -0800
>> From: "Luca de Alfaro" <luca(a)dealfaro.org>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID:
>> <28fa90930811241722y25c26bf1i6441b489e3ff6285(a)mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> I agree with Gregory that it is very useful to quantify the usefulness of
>> trust information on text -- otherwise, all comparison are very
>> subjective.
>> In our WikiSym 08 paper, we measure various parameters of the "trust"
>> coloring we compute, including:
>>
>> - Recall of deletions. Only 3.4% of text is in the lower half of trust
>> values, yet this is 66% of the text that is deleted in the very next
>> revision.
>> - Precision of deletions. Text is the bottom half of trust values has
>> probability 33% of being deleted in the next revision, agaist a
>> probability
>> of 1.9% for general text. The deletion probability raises to 62% for
>> text
>> in the bottom 20% of trust values.
>> - We study the correlation between the trust of a word, sampled at
>> random
>> in all revisions, and the future lifespan of a word (correcting for the
>> finite horizon effect due to the finite number of revisions in each
>> article), showing positive correlation.
>>
>> Some aspects are not captured by the above measures:
>>
>> - We ensured that every "tampering" (including cut-and-paste) are
>> reflected in the trust coloring, so it is hard to subvert the algorithm
>> (does "age" provide this?).
>> - We ensured the whole scheme is robust wrt attacks (see the various
>> papers if you are interested).
>>
>> I fully believe that it should not be hard to improve on our system re.
>> the
>> above measurements. And I fully agree that the "reputation" we compute
>> is
>> essentially an internal parameter of the system, and does not really
>> constitute a good summary of a person's overall Wikipedia contribution;
>> for
>> this and other reasons we do not display it.
>>
>> Luca
>>
>> A simply objective challenge for any predictive coloring system would
>>> be to use them in the following experimental procedure:
>>>
>>> * Take a dump of Wikipedia up a year old, use this as the underlying
>>> knowledge for the systems.
>>> * Make several random selections of articles and include the newer
>>> revisions not included in the initial set up to 6 months old. Call
>>> these the test sets.
>>> * The predictive coloring system should then take each revision in a
>>> test set in time order and predict if it will be reverted (Within X
>>> time?).
>>> * The actual edits up to now should be analyzed to determined which
>>> changes actually were reverted and when.
>>>
>>> The final score will be the false positive and false negative rates.
>>> So long as e assume that the existing editing practices are not too
>>> bad we should find that the best predictive coloring system would
>>> generally tend to minimize these rates.
>>> _______________________________________________
>>> Wikipedia-l mailing list
>>> Wikipedia-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>>
>>
>>
>> ------------------------------
>>
>> Message: 8
>> Date: Mon, 24 Nov 2008 17:35:13 -0800
>> From: "Luca de Alfaro" <luca(a)dealfaro.org>
>> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
>> Quality
>> To: wikipedia-l(a)lists.wikimedia.org
>> Message-ID:
>> <28fa90930811241735l235af9cag554632448d80ef7(a)mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Maury,
>>
>> perhaps I can help explain the behavior you saw in the UCSC system (I am
>> one
>> of the developers).
>> New text is always somewhat orange, to signal to visitors that it has not
>> yet been fully reviewed.
>> The higher the reputation, the lighter the shade of orange, but orange it
>> still is (I have no idea of how high was your computed reputation when
>> you
>> started writing that article).
>>
>> Text background becomes white when other people revise it without
>> drastically changing it: this indicates consensus.
>> In our more recent code version, we also have a "vote" button; using
>> this,
>> text can more speedily gain trust without need for many revisions to
>> occur.
>> In a live experiment, where people can click on the vote button, I
>> presume
>> the trust of the text would raise more rapidly. Note that the code
>> prevents
>> double voting, or creating sock-puppet accounts to vote, etc etc.
>>
>> So I don't think based on what you say that the system is tripping over
>> diffs. It is simply considering new text less trusted, and more revised
>> text more trusted, which is what we wanted. It appears however we don't
>> do
>> a very good job on the web site describing the algorithm (I guess we put
>> most of the description work in writing the papers... we will try to
>> improve
>> the web site).
>>
>> We don't measure "edit work" in number of edits, but in number of words
>> changed.
>> As you say, for our system, changing 1000 words in separate edits is the
>> same (provided the edits are all kept, i.e., not reverted) as providing a
>> single 1000-word contribution. We thought of giving a larger prize to
>> larger contributions: precisely, of making the reputation increment
>> proportional to n^a, where n is the number of words, and a > 1. This did
>> not work well for the Wikipedia, because it ended up not rewarding enough
>> the work of the many editors, who clean and polish the articles, thus
>> making
>> many small edits. Technically it would be trivial to change the code to
>> include such a non-linear reward scheme (to adopt rewards proportional to
>> n^a rather than n); whether it is desirable, I have no idea. It does not
>> lead to better quantitative performance of the system, i.e., the
>> resulting
>> trust is not better at predicting future text deletions.
>>
>> Luca
>>
>>
>>> The USCS system did work, but gave me odd results. Apparently I have a
>>> very bad reputation, because when I look in the History at the first
>>> versions, which I wrote in entirety, it colored it all yellow!
>>>
>>> Newer versions of the same articles had much more white, even though
>>> huge portions of the text were still from the origial. This may be due
>>> to diff problems -- I consider diff to be largely random in
>>> effectiveness, sometimes it works, but othertimes a single whitespace
>>> change, especially vertical, will make it think the entire article was
>>> edited.
>>>
>>> My guess is that the system is tripping over diffs like this, and thus
>>> considering the article to have been re-written by another editor.
>>> Since this has happened, MY reputation goes down, or so I understand
>>> it.
>>>
>>> I don?t think this system could possibly work if based on wiki's
>>> diffs. If its going to work it?s going to need to use a much more
>>> reliable system.
>>>
>>> Another problem I see with it is that it will rank an author who?s
>>> contributions are 1000 unchanged comma inserts to be as reliable as an
>>> author who created a perfect 1000 character article (or perhaps rate
>>> the first even higher). There should be some sort of length bias, if
>>> an author makes a big edit, out of character, that?s important to
>>> know.
>>>
>>> Maury
>>>
>>> _______________________________________________
>>> Wikipedia-l mailing list
>>> Wikipedia-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Wikipedia-l mailing list
>> Wikipedia-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>
>>
>> End of Wikipedia-l Digest, Vol 64, Issue 3
>> ******************************************
>>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>
>
> End of Wikipedia-l Digest, Vol 64, Issue 4
> ******************************************
susceibe
----- Original Message -----
From: <wikipedia-l-request(a)lists.wikimedia.org>
To: <wikipedia-l(a)lists.wikimedia.org>
Sent: Tuesday, November 25, 2008 1:35 AM
Subject: Wikipedia-l Digest, Vol 64, Issue 3
> Send Wikipedia-l mailing list submissions to
> wikipedia-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
> or, via email, send a message with subject or body 'help' to
> wikipedia-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikipedia-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikipedia-l digest..."
>
>
> Today's Topics:
>
> 1. suscribe (Jocla)
> 2. Study on Interfaces to Improving Wikipedia Quality
> (avani(a)cs.umn.edu)
> 3. Re: Study on Interfaces to Improving Wikipedia Quality
> (michael west)
> 4. Re: Study on Interfaces to Improving Wikipedia Quality
> (Joseph Reagle)
> 5. Re: Study on Interfaces to Improving Wikipedia Quality
> (Maury Markowitz)
> 6. Re: Study on Interfaces to Improving Wikipedia Quality
> (Gregory Maxwell)
> 7. Re: Study on Interfaces to Improving Wikipedia Quality
> (Luca de Alfaro)
> 8. Re: Study on Interfaces to Improving Wikipedia Quality
> (Luca de Alfaro)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 19 Nov 2008 18:22:03 -0000
> From: "Jocla" <paresdoce(a)gmail.com>
> Subject: [Wikipedia-l] suscribe
> To: <wikipedia-l(a)lists.wikimedia.org>
> Message-ID: <001c01c94a73$ba1850e0$7f01a8c0@windows337902b>
> Content-Type: text/plain; charset="iso-8859-1"
>
> thanks for your e-mail, i would like to suscribe.
>
> ------------------------------
>
> Message: 2
> Date: 19 Nov 2008 13:23:53 -0600
> From: avani(a)cs.umn.edu
> Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID: <Prayer.1.0.18.0811191323530.7842(a)sabinus.cs.umn.edu>
> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
>
>
> Dear All,
>
> My name is Avanidhar Chandrasekaran
> (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
>
> I work with GroupLens Research at the University of Minnesota, Twin
> Cities.
> As part of my research, I am involved in analyzing the usefulness and
> Necessity of author reputation in Wikipedia.
>
> In lieu of this, I have simulated an Interface to color words in an
> article
> based on their Age.
>
> Being experienced contributors to Wikipedia, I invite you to participate
> in
> this study, which involves the following.
>
> 1. Please visit the following Instances of wikipedia and evaluate the
> interface components which have been incorporated into each of them. Each
> of these use their own algorithm to color text.
>
> a) The Wikitrust project
>
> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
>
> b) The Wiki-reputation project at Grouplens research
>
> http://wiki-reputation.cs.umn.edu/index.php/Main_Page
>
> 2) Once you have evaluated the two interfaces, kindly complete this survey
> on Wikipedia quality
>
> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
>
>
> We hope to get your valuable feedback on these interfaces and how
> Wikipedia
> article quality can be improved.
>
> Thanks for your time
>
> Avanidhar Chandrasekaran,
>
> GroupLens Research, University of Minnesota
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 19 Nov 2008 20:01:27 +0000
> From: "michael west" <michawest(a)gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <cfe6de600811191201h727fb4e4s9660f64f2815c93f(a)mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2008/11/19 <avani(a)cs.umn.edu>
>
>>
>> Dear All,
>>
>> My name is Avanidhar Chandrasekaran
>> (http://en.wikipedia.org/wiki/User_talk:Avanidhar).
>>
>> I work with GroupLens Research at the University of Minnesota, Twin
>> Cities.
>> As part of my research, I am involved in analyzing the usefulness and
>> Necessity of author reputation in Wikipedia.
>>
>> In lieu of this, I have simulated an Interface to color words in an
>> article
>> based on their Age.
>>
>> Being experienced contributors to Wikipedia, I invite you to participate
>> in
>> this study, which involves the following.
>>
>> 1. Please visit the following Instances of wikipedia and evaluate the
>> interface components which have been incorporated into each of them. Each
>> of these use their own algorithm to color text.
>>
>> a) The Wikitrust project
>>
>> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
>>
>> b) The Wiki-reputation project at Grouplens research
>>
>> http://wiki-reputation.cs.umn.edu/index.php/Main_Page
>>
>> 2) Once you have evaluated the two interfaces, kindly complete this
>> survey
>> on Wikipedia quality
>>
>> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d
>>
>>
>> We hope to get your valuable feedback on these interfaces and how
>> Wikipedia
>> article quality can be improved.
>>
>> Thanks for your time
>>
>> Avanidhar Chandrasekaran,
>>
>> GroupLens Research, University of Minnesota
>>
>
> Quite interesting - the "age of words" color coding might be useful in
> detecting obtuse type vandalism.
>
> m
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 19 Nov 2008 17:40:23 -0500
> From: Joseph Reagle <reagle(a)mit.edu>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID: <200811191740.23471.reagle(a)mit.edu>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Wednesday 19 November 2008, avani(a)cs.umn.edu wrote:
>> We hope to get your valuable feedback on these interfaces and how
>> Wikipedia
>> article quality can be improved.
>
> This might bias other respondants, but I thought it was an intersting idea
> so I wanted to share it. I concluded with the following which is no doubt
> affected by my being a WikiGnome:
>
> [[
> If I see an error, I fix it without much regard to time or author
> reputation. I do pay attention to and investigate author reputation on
> substantive issues on the discussion pages and it would be interesting to
> see a discussion thread colored according to reputation.
> ]]
>
>
>
> ------------------------------
>
> Message: 5
> Date: Sun, 23 Nov 2008 09:03:25 -0500
> From: "Maury Markowitz" <maury.markowitz(a)gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0(a)mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote:
>> We hope to get your valuable feedback on these interfaces and how
>> Wikipedia
>> article quality can be improved.
>
> Given the older snapshots, I selected older articles that I had
> started, NuBUS and ARCNET.
>
> The "time based" system from UMN did not work at all, every search
> resulted in a page not found.
>
> The USCS system did work, but gave me odd results. Apparently I have a
> very bad reputation, because when I look in the History at the first
> versions, which I wrote in entirety, it colored it all yellow!
>
> Newer versions of the same articles had much more white, even though
> huge portions of the text were still from the origial. This may be due
> to diff problems -- I consider diff to be largely random in
> effectiveness, sometimes it works, but othertimes a single whitespace
> change, especially vertical, will make it think the entire article was
> edited.
>
> My guess is that the system is tripping over diffs like this, and thus
> considering the article to have been re-written by another editor.
> Since this has happened, MY reputation goes down, or so I understand
> it.
>
> I don?t think this system could possibly work if based on wiki's
> diffs. If its going to work it?s going to need to use a much more
> reliable system.
>
> Another problem I see with it is that it will rank an author who?s
> contributions are 1000 unchanged comma inserts to be as reliable as an
> author who created a perfect 1000 character article (or perhaps rate
> the first even higher). There should be some sort of length bias, if
> an author makes a big edit, out of character, that?s important to
> know.
>
> Maury
>
>
>
> ------------------------------
>
> Message: 6
> Date: Sun, 23 Nov 2008 09:44:40 -0500
> From: "Gregory Maxwell" <gmaxwell(a)gmail.com>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <e692861c0811230644i316f94abg6cafe7ef87f6bc3b(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz
> <maury.markowitz(a)gmail.com> wrote:
>> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote:
>>> We hope to get your valuable feedback on these interfaces and how
>>> Wikipedia
>>> article quality can be improved.
>>
>> Given the older snapshots, I selected older articles that I had
>> started, NuBUS and ARCNET.
>>
>> The "time based" system from UMN did not work at all, every search
>> resulted in a page not found.
>
> The UMN system intentionally included only a small number (70?)
> articles. This is why you needed to use the random page function to
> browse among them.
>
> This doesn't reflect any short coming of the system, but it most
> likely just reflects the limits of computational resources they were
> working under.
>
> [snip]
>> Newer versions of the same articles had much more white, even though
>> huge portions of the text were still from the origial. This may be due
>> to diff problems -- I consider diff to be largely random in
>> effectiveness, sometimes it works, but othertimes a single whitespace
>> change, especially vertical, will make it think the entire article was
>> edited.
>
> Yes, I had exactly the same experience with the USCS system: Different
> coloring for text I'd added in same edit which created the article.
> Quite inscrutable.
>
> [snip]
>> Another problem I see with it is that it will rank an author who?s
>> contributions are 1000 unchanged comma inserts to be as reliable as an
>> author who created a perfect 1000 character article (or perhaps rate
>> the first even higher). There should be some sort of length bias, if
>> an author makes a big edit, out of character, that?s important to
>> know.
>
> For the articles it covered I found the UMN system to be more usable:
> It's output was more explicable, and the signal to noise ratio was
> just better. This may be partially due to bugs in the USCS history
> analysis, and different a different choice in coloring thresholds
> (USCS seemed to color almost everything, removing the usefulness of
> color as something to draw my attention).
>
> Even so, I'm distrustful of "reputation" as an automated metric.
> Reputation is a fuzzy thing (consider your comma example), but time is
> just a straight forward metric which is much easier to get right. Your
> tireless and unreverted editing of external links tells me very little
> about your ability to make a reliable edit to the intro of an article,
> ... or at least very little that I didn't already know by merely
> knowing if your account was brand new or not. (New accounts are more
> likely to be used by inexperienced and ill-motivated persons)
>
> I believe a metric applied correctly, consistently, and understandably
> is just going to be more useful than a metric which considers more
> data but is also subject to more noise. The differential performance
> between these two systems has done nothing but confirm my suspicions
> in this regard.
>
> A simply objective challenge for any predictive coloring system would
> be to use them in the following experimental procedure:
>
> * Take a dump of Wikipedia up a year old, use this as the underlying
> knowledge for the systems.
> * Make several random selections of articles and include the newer
> revisions not included in the initial set up to 6 months old. Call
> these the test sets.
> * The predictive coloring system should then take each revision in a
> test set in time order and predict if it will be reverted (Within X
> time?).
> * The actual edits up to now should be analyzed to determined which
> changes actually were reverted and when.
>
> The final score will be the false positive and false negative rates.
> So long as e assume that the existing editing practices are not too
> bad we should find that the best predictive coloring system would
> generally tend to minimize these rates.
>
> ------------------------------
>
> Message: 7
> Date: Mon, 24 Nov 2008 17:22:23 -0800
> From: "Luca de Alfaro" <luca(a)dealfaro.org>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <28fa90930811241722y25c26bf1i6441b489e3ff6285(a)mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I agree with Gregory that it is very useful to quantify the usefulness of
> trust information on text -- otherwise, all comparison are very
> subjective.
> In our WikiSym 08 paper, we measure various parameters of the "trust"
> coloring we compute, including:
>
> - Recall of deletions. Only 3.4% of text is in the lower half of trust
> values, yet this is 66% of the text that is deleted in the very next
> revision.
> - Precision of deletions. Text is the bottom half of trust values has
> probability 33% of being deleted in the next revision, agaist a
> probability
> of 1.9% for general text. The deletion probability raises to 62% for
> text
> in the bottom 20% of trust values.
> - We study the correlation between the trust of a word, sampled at
> random
> in all revisions, and the future lifespan of a word (correcting for the
> finite horizon effect due to the finite number of revisions in each
> article), showing positive correlation.
>
> Some aspects are not captured by the above measures:
>
> - We ensured that every "tampering" (including cut-and-paste) are
> reflected in the trust coloring, so it is hard to subvert the algorithm
> (does "age" provide this?).
> - We ensured the whole scheme is robust wrt attacks (see the various
> papers if you are interested).
>
> I fully believe that it should not be hard to improve on our system re.
> the
> above measurements. And I fully agree that the "reputation" we compute is
> essentially an internal parameter of the system, and does not really
> constitute a good summary of a person's overall Wikipedia contribution;
> for
> this and other reasons we do not display it.
>
> Luca
>
> A simply objective challenge for any predictive coloring system would
>> be to use them in the following experimental procedure:
>>
>> * Take a dump of Wikipedia up a year old, use this as the underlying
>> knowledge for the systems.
>> * Make several random selections of articles and include the newer
>> revisions not included in the initial set up to 6 months old. Call
>> these the test sets.
>> * The predictive coloring system should then take each revision in a
>> test set in time order and predict if it will be reverted (Within X
>> time?).
>> * The actual edits up to now should be analyzed to determined which
>> changes actually were reverted and when.
>>
>> The final score will be the false positive and false negative rates.
>> So long as e assume that the existing editing practices are not too
>> bad we should find that the best predictive coloring system would
>> generally tend to minimize these rates.
>> _______________________________________________
>> Wikipedia-l mailing list
>> Wikipedia-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>
>
>
> ------------------------------
>
> Message: 8
> Date: Mon, 24 Nov 2008 17:35:13 -0800
> From: "Luca de Alfaro" <luca(a)dealfaro.org>
> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia
> Quality
> To: wikipedia-l(a)lists.wikimedia.org
> Message-ID:
> <28fa90930811241735l235af9cag554632448d80ef7(a)mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Maury,
>
> perhaps I can help explain the behavior you saw in the UCSC system (I am
> one
> of the developers).
> New text is always somewhat orange, to signal to visitors that it has not
> yet been fully reviewed.
> The higher the reputation, the lighter the shade of orange, but orange it
> still is (I have no idea of how high was your computed reputation when you
> started writing that article).
>
> Text background becomes white when other people revise it without
> drastically changing it: this indicates consensus.
> In our more recent code version, we also have a "vote" button; using this,
> text can more speedily gain trust without need for many revisions to
> occur.
> In a live experiment, where people can click on the vote button, I presume
> the trust of the text would raise more rapidly. Note that the code
> prevents
> double voting, or creating sock-puppet accounts to vote, etc etc.
>
> So I don't think based on what you say that the system is tripping over
> diffs. It is simply considering new text less trusted, and more revised
> text more trusted, which is what we wanted. It appears however we don't
> do
> a very good job on the web site describing the algorithm (I guess we put
> most of the description work in writing the papers... we will try to
> improve
> the web site).
>
> We don't measure "edit work" in number of edits, but in number of words
> changed.
> As you say, for our system, changing 1000 words in separate edits is the
> same (provided the edits are all kept, i.e., not reverted) as providing a
> single 1000-word contribution. We thought of giving a larger prize to
> larger contributions: precisely, of making the reputation increment
> proportional to n^a, where n is the number of words, and a > 1. This did
> not work well for the Wikipedia, because it ended up not rewarding enough
> the work of the many editors, who clean and polish the articles, thus
> making
> many small edits. Technically it would be trivial to change the code to
> include such a non-linear reward scheme (to adopt rewards proportional to
> n^a rather than n); whether it is desirable, I have no idea. It does not
> lead to better quantitative performance of the system, i.e., the resulting
> trust is not better at predicting future text deletions.
>
> Luca
>
>
>> The USCS system did work, but gave me odd results. Apparently I have a
>> very bad reputation, because when I look in the History at the first
>> versions, which I wrote in entirety, it colored it all yellow!
>>
>> Newer versions of the same articles had much more white, even though
>> huge portions of the text were still from the origial. This may be due
>> to diff problems -- I consider diff to be largely random in
>> effectiveness, sometimes it works, but othertimes a single whitespace
>> change, especially vertical, will make it think the entire article was
>> edited.
>>
>> My guess is that the system is tripping over diffs like this, and thus
>> considering the article to have been re-written by another editor.
>> Since this has happened, MY reputation goes down, or so I understand
>> it.
>>
>> I don?t think this system could possibly work if based on wiki's
>> diffs. If its going to work it?s going to need to use a much more
>> reliable system.
>>
>> Another problem I see with it is that it will rank an author who?s
>> contributions are 1000 unchanged comma inserts to be as reliable as an
>> author who created a perfect 1000 character article (or perhaps rate
>> the first even higher). There should be some sort of length bias, if
>> an author makes a big edit, out of character, that?s important to
>> know.
>>
>> Maury
>>
>> _______________________________________________
>> Wikipedia-l mailing list
>> Wikipedia-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>
>
>
> ------------------------------
>
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>
>
> End of Wikipedia-l Digest, Vol 64, Issue 3
> ******************************************
>
Hi, I'm a citizen of Republic of Moldova and I want to inform you
that in our country everyone is writing Moldovan language with latin
letters.
When we were under soviet union occupation, they tryed to russificate us
and forced to have our language written with cyrilic.
In 1991, after getting the freedom to choose, we choose our language to be
written with latin letters, as we did before russians conquest us (without
ask the people) and divided from Romania (our mother land).
Thereby, as a free moldovan speaking man, I'm asking you to remove
mo.wikipedia.org (witch is in cyrillic and is very offensive for us) and
respect our choice as a independent nation or to make it with latin letters.
Thank you.