Wikipedia-l November 2008

wikipedia-l@lists.wikimedia.org

18 participants
13 discussions

by Daniel Mayer

On Sunday 28 July 2002 03:00 am, The Cunctator wrote: > What are the articles this person has been changing? For 66.108.155.126: 20:08 Jul 27, 2002 Computer 20:07 Jul 27, 2002 Exploit 20:07 Jul 27, 2002 AOL 20:05 Jul 27, 2002 Hacker 20:05 Jul 27, 2002 Leet 20:03 Jul 27, 2002 Root 20:02 Jul 27, 2002 Hacker 19:59 Jul 27, 2002 Hacker 19:58 Jul 27, 2002 Hacker 19:54 Jul 27, 2002 Principle of least astonishment 19:54 Jul 27, 2002 Hacker 19:52 Jul 27, 2002 Trance music 19:51 Jul 27, 2002 Trance music For 208.24.115.6: 20:20 Jul 27, 2002 Hacker For 141.157.232.26: 20:19 Jul 27, 2002 Hacker Most of these were complete replacements with discoherent statements. Such as "TAP IS THE ABSOLUTE DEFINITION OF THE NOUN HACKER" for Hacker. For the specifics follow http://www.wikipedia.com/wiki/Special:Ipblocklist and look at the contribs. --mav

1 year, 3 months

Language policy agreement in the Norwegian Wikipedia community

by Bjarte Sørensen

Dear all, Most of you would be aware of some of the discussions that have occurred around Wikipedia in the Norwegian languages. Since the last round of discussions on this list, there has been a lot of internal debate, as well as what seems to be a fairly widely accepted agreement following voting. This e-mail intends to, after a brief recap on Norwegian language and wikipedia issues, take those interested through the latest development and will stake out the road ahead. It is also intended to inform the international community about the current agreement on no.wikipedia, so as to prevent misunderstandings in the future. Finally, we will mention an unfortunate reaction to the vote by a small number of users at the Norwegian Bokmål/Riksmål (no:) wikipedia who want to disregard the result of the voting and are planning to create a _third_ Norwegian wikipedia with the sole mission of mixing the contents of the two current Norwegian versions. == A short language history of Norway == Spoken Norwegian ("norsk") (ISO 639-2 alpha-2 code "no") is in a fairly unique situation compared to most other languages of the world in that it has two widely accepted written standards, Bokmål (ISO 639-2 alpha-2 code "nb") and Nynorsk (ISO 639-2 alpha-2 code "nn"). By national legislation they are both regarded as official written forms of Norwegian. In addition, many people still make a distinction between Bokmål and its precursor which still is in use, Riksmål. Briefly speaking, Bokmål and Riksmål are descendants of the Danish written language. Until the 1800s, Danish was the only widely used written language in Norway as a result of four centuries of union with Denmark. With increasing independence came a wish to norwegianise the Danish standard, with Knud Knudsen at the forefront for changing parts of the vocabulary and orthographics. Thus, Riksmål, and later Bokmål, resulted. These forms together are today probably used by about 90% of Norway's population, or somewhere around 3,500,000 people. Parallel to this development, a new written standard was created by Ivar Aasen. He travelled extensively throughout Norway, and based his new language, landsmål, on the grammar and vocabulary of dialect samples from around the country. This was later renamed Nynorsk. Modern Nynorsk differs significantly from modern Bokmål, and may be linguistically looked upon as as different (or as similar if you like) as Swedish is to Danish. For English or Dutch/German speakers, the differences may be likened to those between (Lowland) Scots and English or Low German and Dutch. Today it is estimated that about 500,000-600,000 people have Nynorsk as their first written language. More information about the Norwegian language history can be found in English, German, French, Spanish or Portuguese on the website of the Norwegian Language Council: http://www.sprakrad.no/templates/Page.aspx?id=653 == A short history of Wikipedia in Norwegian == The first Norwegian wikipedia started 26 November 2001 on the subdomain no.wikipedia.org. As most wikipedias, its contributor and article count started really picking up around the end of 2003. At the time, it accepted all written standards of Norwegian, although the amount of Nynorsk was minimal. There were already several debates about the feasibilty and appropriateness of keeping the two languages united on one Wikipedia. On 31 July 2004 a Wikipedia for Nynorsk was created. The creation of nn:, however, split the community at no: wikipedia. Many felt that given that Nynorsk now had its own wikipedia, no: should become a Bokmål/Riksmål Wikipedia only. Others disapproved and claimed that there was no need to change and that it should continue its language policy of accepting all and keep its interwiki link name of "Norsk". Nynorsk Wikipedia soon proved a success, as it within the next few months gathered several people who had felt uncomfortable in the (mainly) Bokmål environment at no:. The name displayed in interwiki links became "Norsk (nynorsk)" (languages are not spelt with upper case in Norwegian). To date it continues to be one of the fastest growing wikipedias, with a steady article increase, now at over 6000 articles and >50 editors with more than 10 edits since arrival. == Votes == The issue of no:'s language policy has come up time and again, and a vote was held in March ([[:no:Wikipedia:Målform]]) as to which policy to adapt. Independent of the method of the tally (whether or not to include new contributors etc.) there was a majority for switching to a Bokmål/Riksmål only language policy (50% for Bokmål/Riksmål, 43.2% for Bokmål/Riksmål/Nynorsk/Høgnorsk, and 6.8% for the official variants Bokmål/Nynorsk only). Following this result, there is now going to be a vote on which interwiki link name will most appropriately reflect the current language policy of no:. The result of this vote will most likely be either "Norsk (bokmål)" or "Norsk (bokmål/riksmål)". Understandably, there has also been a debate as to whether the subdomain should change from "no" to "nb", as this is the correct representation of Bokmål according to ISO 639-2. However, there is some resentment towards such a move and currently a general acceptance in letting the Bokmål wikipedia stay at "no". The alternative some have suggested is a server-side redirect from "no" to "nb", in the same way that "nb" today is a server-side redirect to the equivalent page on "no". == Summary of the problem == Unfortunately, a small group of users (who all write Bokmål/Riksmål) are ignoring the results from the vote, and are claiming they want to re-establish a wikipedia for all written standards of Norwegian. They claim they have been in touch with people centrally in Wikimedia (developers? stewards?) and that they have so far received positive comments. With this email, we would like to state the fact that there have been no official decisions about creating a third Norwegian wikipedia containing both Bokmål and Nynorsk, it is merely an unofficial initiative from a small group of users which started a sign-on list at [[:no:Bruker:Norsk_Wikipedia]]. A spontaneous list with signatures against this activity was immediately created at [[:no:Wikipedia-diskusjon:Fellesnorsk]]. The process of creating a third Norwegian wikipedia has not gone through a voting process in any of the two existing Norwegian wikipedias (no: and nn:) and can not be considered as a decision by the Norwegian Wikipedia community. We believe the creation of a third wikipedia under the Wikimedia foundation would have a serious and unfortunate impact on the existing wikipedias in Norwegian, no: and nn:, and would undermine Wikipedia's reputation in Norway. This being said, we are all for extensive co- operation between the four Scandinavian language wikipedias (including Swedish and Danish), as evident by the recent creation of [[:meta:Skanwiki]], the Scandinavian meta-pages, and the use of featured articles from neighbour wikipedias. == Conclusion == Hopefully, this letter will help people better understand the complicated language situation of the Norwegian Wikipedia community, so as to give a background on which discussion can take place on this list in the future, such as the inevitable debate following a possible request for a re-establishment of the common (and third!) Norwegian Wikipedia. >From the community of no.wikipedia.org and nn.wikipedia.org, Bjarte Sørensen [[:meta:User:BjarteSorensen]] (Administrator/bureaucrat on nn:) Lars Alvik [[:no:User:Profoss]] (Administrator/bureaucrat on no:) Øyvind A. Holm [[:no:User:Sunny256]] (Administrator on no:) Onar Vikingstad [[:no:User:Vikingstad]] (Administrator on no:) Jon Harald Søby [[:no:User:Jhs]] (Administrator on no:) Chris Nyborg [[:no:User:Cnyborg]] (Administrator on no:) Guttorm Flatabø [[:no:User:Dittaeva]] (Administrator on nn:) Gunleiv Hadland [[:meta:User:Gunnernett]] (Administrator on nn:) Jarle Fagerheim [[:nn:User:Jarle]] (Administrator on nn:) Øyvind Jo Heimdal Eik [[:en:User:Pladask]] (Administrator on nn: and no:) Kristian André Gallis [[:nn:User:Kristaga]] Vegard Wærp [[:no:User:Vegardw]] Nina Aldin Thune [[:no:User:Nina]] Thor-Rune Hansen [[:no:User:ThorRune]] Claes Tande [[:no:User:Ctande]] Arnt-Erik Krokaa [[:no:User:AEK]] Rune Sattler [[:no:User:Shauni]]

12 years

[Help] About Participate in Wikipedia - knoweldge sharing.

by Joanne (雅玲)

Dear friend, We are conducting a study on the motivation of the knowledge sharing on the Wikipedia community. The contributors’ experience to Linux is very important to the design and management of this knowledge platform. Would you please post the following on-line questionnaire message to the Wikipedia platform or forward the message to the members? After the survey is done, we will randomly select twenty persons and present them with USB 2GB Flash Drives. Besides, with each valid questionnaire, we will donate US $1 dollar to the Wikimedia Foundation. The result of this survey is analyzed in an anonymous way and is only regarded as the academic use. Please help us to complete the data collection. Thanks so much for your help. Cheers, Joanne [The Message content] Dear friends, We are conducting a study on the motivation of the knowledge sharing on Wikipedia. Your experience of the read from and write to Wikipedia is very important to the design and management of this knowledge platform. The survey will take about two minutes. We deeply appreciate your help on answering the following questions. After the survey is done, we will randomly select twenty persons and present them with USB 2GB Flash Drives. Besides, with each valid questionnaire, we will donate US $1 dollar to the Wikimedia Foundation. The result of this survey is analyzed in an anonymous way and is only regarded as the academic use. Please feel free to fill out the questionnaire. Thanks again for your time and valuable input. May happiness and health be with you everyday! ★ On-line Questionnaire: http://140.119.19.152:8080/wiki/ 　 Shari S. C. Shang Eldon Y. Li Professor, Department of Management Information Systems, National Chengchi University Tel.: +886-2-82374038； Fax: +886-2-29393754 ； E-mail: s1213527(a)yahoo.com. tw

15 years, 5 months

[Help] About Participate in CentOS - knoweldge sharing.

by Joanne (雅玲)

15 years, 5 months

Study on Interfaces to Improving Wikipedia Quality

by avani＠cs.umn.edu

Dear All, My name is Avanidhar Chandrasekaran (http://en.wikipedia.org/wiki/User_talk:Avanidhar). I work with GroupLens Research at the University of Minnesota, Twin Cities. As part of my research, I am involved in analyzing the usefulness and Necessity of author reputation in Wikipedia. In lieu of this, I have simulated an Interface to color words in an article based on their Age. Being experienced contributors to Wikipedia, I invite you to participate in this study, which involves the following. 1. Please visit the following Instances of wikipedia and evaluate the interface components which have been incorporated into each of them. Each of these use their own algorithm to color text. a) The Wikitrust project http://wiki-trust.cse.ucsc.edu/index.php/Main_Page b) The Wiki-reputation project at Grouplens research http://wiki-reputation.cs.umn.edu/index.php/Main_Page 2) Once you have evaluated the two interfaces, kindly complete this survey on Wikipedia quality http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d We hope to get your valuable feedback on these interfaces and how Wikipedia article quality can be improved. Thanks for your time Avanidhar Chandrasekaran, GroupLens Research, University of Minnesota

15 years, 5 months

hel Re: Wikipedia-l Digest, Vol 64, Issue 4

by Jocla

----- Original Message ----- From: <wikipedia-l-request(a)lists.wikimedia.org> To: <wikipedia-l(a)lists.wikimedia.org> Sent: Tuesday, November 25, 2008 7:59 AM Subject: Wikipedia-l Digest, Vol 64, Issue 4 > Send Wikipedia-l mailing list submissions to > wikipedia-l(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > or, via email, send a message with subject or body 'help' to > wikipedia-l-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikipedia-l-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikipedia-l digest..." > > > Today's Topics: > > 1. Re: Study on Interfaces to Improving Wikipedia Quality > (J.L.W.S. The Special One) > 2. Re: Study on Interfaces to Improving Wikipedia Quality > (Gregory Maxwell) > 3. Re: Wikipedia-l Digest, Vol 64, Issue 3 (Jocla) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 25 Nov 2008 11:59:54 +0800 > From: "J.L.W.S. The Special One" <hildanknight(a)gmail.com> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <d41ac4640811241959o239db59bp1807b90877a33aa9(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > How would the system handle a paragraph full of high quality, > well-referenced and well-organised content contributed by an editor A, > that > is thoroughly copyedited by an editor B? Would editor A be deemed less > trustworthy when his prose is thoroughly copyedited? > > 2008/11/25, Luca de Alfaro <luca(a)dealfaro.org>: >> >> Maury, >> >> perhaps I can help explain the behavior you saw in the UCSC system (I am >> one >> of the developers). >> New text is always somewhat orange, to signal to visitors that it has not >> yet been fully reviewed. >> The higher the reputation, the lighter the shade of orange, but orange it >> still is (I have no idea of how high was your computed reputation when >> you >> started writing that article). >> >> Text background becomes white when other people revise it without >> drastically changing it: this indicates consensus. >> In our more recent code version, we also have a "vote" button; using >> this, >> text can more speedily gain trust without need for many revisions to >> occur. >> In a live experiment, where people can click on the vote button, I >> presume >> the trust of the text would raise more rapidly. Note that the code >> prevents >> double voting, or creating sock-puppet accounts to vote, etc etc. >> >> So I don't think based on what you say that the system is tripping over >> diffs. It is simply considering new text less trusted, and more revised >> text more trusted, which is what we wanted. It appears however we don't >> do >> a very good job on the web site describing the algorithm (I guess we put >> most of the description work in writing the papers... we will try to >> improve >> the web site). >> >> We don't measure "edit work" in number of edits, but in number of words >> changed. >> As you say, for our system, changing 1000 words in separate edits is the >> same (provided the edits are all kept, i.e., not reverted) as providing a >> single 1000-word contribution. We thought of giving a larger prize to >> larger contributions: precisely, of making the reputation increment >> proportional to n^a, where n is the number of words, and a > 1. This did >> not work well for the Wikipedia, because it ended up not rewarding enough >> the work of the many editors, who clean and polish the articles, thus >> making >> many small edits. Technically it would be trivial to change the code to >> include such a non-linear reward scheme (to adopt rewards proportional to >> n^a rather than n); whether it is desirable, I have no idea. It does not >> lead to better quantitative performance of the system, i.e., the >> resulting >> trust is not better at predicting future text deletions. >> >> >> Luca >> >> >> >> > The USCS system did work, but gave me odd results. Apparently I have a >> > very bad reputation, because when I look in the History at the first >> > versions, which I wrote in entirety, it colored it all yellow! >> > >> > Newer versions of the same articles had much more white, even though >> > huge portions of the text were still from the origial. This may be due >> > to diff problems -- I consider diff to be largely random in >> > effectiveness, sometimes it works, but othertimes a single whitespace >> > change, especially vertical, will make it think the entire article was >> > edited. >> > >> > My guess is that the system is tripping over diffs like this, and thus >> > considering the article to have been re-written by another editor. >> > Since this has happened, MY reputation goes down, or so I understand >> > it. >> > >> > I don?t think this system could possibly work if based on wiki's >> > diffs. If its going to work it?s going to need to use a much more >> > reliable system. >> > >> > Another problem I see with it is that it will rank an author who?s >> > contributions are 1000 unchanged comma inserts to be as reliable as an >> > author who created a perfect 1000 character article (or perhaps rate >> > the first even higher). There should be some sort of length bias, if >> > an author makes a big edit, out of character, that?s important to >> > know. >> > >> > Maury >> > >> > _______________________________________________ >> > Wikipedia-l mailing list >> > Wikipedia-l(a)lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >> > >> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >> > > > > -- > Written with passion, > J.L.W.S. The Special One > > > ------------------------------ > > Message: 2 > Date: Mon, 24 Nov 2008 23:14:57 -0500 > From: "Gregory Maxwell" <gmaxwell(a)gmail.com> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <e692861c0811242014h464f5a2ei31a1cb3aecfea04b(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Nov 24, 2008 at 8:35 PM, Luca de Alfaro <luca(a)dealfaro.org> wrote: > [snip] >> So I don't think based on what you say that the system is tripping over >> diffs. > > For example: I can't figure out why the text in the image caption is > colored here > http://wiki-trust.cse.ucsc.edu/index.php/Digital_room_correction > > I couldn't initially figure out why *anything* above the external link > section was colored? though the inability to diff contributed to that. > > On Mon, Nov 24, 2008 at 8:22 PM, Luca de Alfaro <luca(a)dealfaro.org> wrote: >> I agree with Gregory that it is very useful to quantify the usefulness of >> trust information on text -- otherwise, all comparison are very >> subjective. >> In our WikiSym 08 paper, we measure various parameters of the "trust" >> coloring we compute, including: >> >> - Recall of deletions. Only 3.4% of text is in the lower half of trust >> values, yet this is 66% of the text that is deleted in the very next >> revision. >> - Precision of deletions. Text is the bottom half of trust values has >> probability 33% of being deleted in the next revision, agaist a >> probability >> of 1.9% for general text. The deletion probability raises to 62% for >> text >> in the bottom 20% of trust values. >> - We study the correlation between the trust of a word, sampled at >> random >> in all revisions, and the future lifespan of a word (correcting for the >> finite horizon effect due to the finite number of revisions in each >> article), showing positive correlation. > [snip] > > These performance metrics are better than I would have guessed from > browsing through the output. How does the color mapping reflect the > trust values? Basically when I use it I see a *lot* of colored things > which are perfectly fine. At least for me, the difference between > shades is far less cognitively significant than colored vs > non-colored, so that may be the source of my confusion. > > Have you compared your system to a simple toy trust metric? I'd > propose "revisions by users in their first week and before their first > 7 (?) edits are untrusted". This reflects the existing automatic > trust system on the site (auto-confirmation), and also reflects the a > type of trust checking applied manually by editors. I think thats > the bar any more sophisticated trust metric needs to outperform. > > Thank you so much for your response! > > ------------------------------ > > Message: 3 > Date: Tue, 25 Nov 2008 07:59:19 -0000 > From: "Jocla" <paresdoce(a)gmail.com> > Subject: Re: [Wikipedia-l] Wikipedia-l Digest, Vol 64, Issue 3 > To: <wikipedia-l(a)lists.wikimedia.org> > Message-ID: <001701c94ed3$b9b78d50$7f01a8c0@windows337902b> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > susceibe > ----- Original Message ----- > From: <wikipedia-l-request(a)lists.wikimedia.org> > To: <wikipedia-l(a)lists.wikimedia.org> > Sent: Tuesday, November 25, 2008 1:35 AM > Subject: Wikipedia-l Digest, Vol 64, Issue 3 > > >> Send Wikipedia-l mailing list submissions to >> wikipedia-l(a)lists.wikimedia.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >> or, via email, send a message with subject or body 'help' to >> wikipedia-l-request(a)lists.wikimedia.org >> >> You can reach the person managing the list at >> wikipedia-l-owner(a)lists.wikimedia.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Wikipedia-l digest..." >> >> >> Today's Topics: >> >> 1. suscribe (Jocla) >> 2. Study on Interfaces to Improving Wikipedia Quality >> (avani(a)cs.umn.edu) >> 3. Re: Study on Interfaces to Improving Wikipedia Quality >> (michael west) >> 4. Re: Study on Interfaces to Improving Wikipedia Quality >> (Joseph Reagle) >> 5. Re: Study on Interfaces to Improving Wikipedia Quality >> (Maury Markowitz) >> 6. Re: Study on Interfaces to Improving Wikipedia Quality >> (Gregory Maxwell) >> 7. Re: Study on Interfaces to Improving Wikipedia Quality >> (Luca de Alfaro) >> 8. Re: Study on Interfaces to Improving Wikipedia Quality >> (Luca de Alfaro) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 19 Nov 2008 18:22:03 -0000 >> From: "Jocla" <paresdoce(a)gmail.com> >> Subject: [Wikipedia-l] suscribe >> To: <wikipedia-l(a)lists.wikimedia.org> >> Message-ID: <001c01c94a73$ba1850e0$7f01a8c0@windows337902b> >> Content-Type: text/plain; charset="iso-8859-1" >> >> thanks for your e-mail, i would like to suscribe. >> >> ------------------------------ >> >> Message: 2 >> Date: 19 Nov 2008 13:23:53 -0600 >> From: avani(a)cs.umn.edu >> Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: <Prayer.1.0.18.0811191323530.7842(a)sabinus.cs.umn.edu> >> Content-Type: text/plain; format=flowed; charset=ISO-8859-1 >> >> >> Dear All, >> >> My name is Avanidhar Chandrasekaran >> (http://en.wikipedia.org/wiki/User_talk:Avanidhar). >> >> I work with GroupLens Research at the University of Minnesota, Twin >> Cities. >> As part of my research, I am involved in analyzing the usefulness and >> Necessity of author reputation in Wikipedia. >> >> In lieu of this, I have simulated an Interface to color words in an >> article >> based on their Age. >> >> Being experienced contributors to Wikipedia, I invite you to participate >> in >> this study, which involves the following. >> >> 1. Please visit the following Instances of wikipedia and evaluate the >> interface components which have been incorporated into each of them. Each >> of these use their own algorithm to color text. >> >> a) The Wikitrust project >> >> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page >> >> b) The Wiki-reputation project at Grouplens research >> >> http://wiki-reputation.cs.umn.edu/index.php/Main_Page >> >> 2) Once you have evaluated the two interfaces, kindly complete this >> survey >> on Wikipedia quality >> >> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d >> >> >> We hope to get your valuable feedback on these interfaces and how >> Wikipedia >> article quality can be improved. >> >> Thanks for your time >> >> Avanidhar Chandrasekaran, >> >> GroupLens Research, University of Minnesota >> >> >> >> >> ------------------------------ >> >> Message: 3 >> Date: Wed, 19 Nov 2008 20:01:27 +0000 >> From: "michael west" <michawest(a)gmail.com> >> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: >> <cfe6de600811191201h727fb4e4s9660f64f2815c93f(a)mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> 2008/11/19 <avani(a)cs.umn.edu> >> >>> >>> Dear All, >>> >>> My name is Avanidhar Chandrasekaran >>> (http://en.wikipedia.org/wiki/User_talk:Avanidhar). >>> >>> I work with GroupLens Research at the University of Minnesota, Twin >>> Cities. >>> As part of my research, I am involved in analyzing the usefulness and >>> Necessity of author reputation in Wikipedia. >>> >>> In lieu of this, I have simulated an Interface to color words in an >>> article >>> based on their Age. >>> >>> Being experienced contributors to Wikipedia, I invite you to participate >>> in >>> this study, which involves the following. >>> >>> 1. Please visit the following Instances of wikipedia and evaluate the >>> interface components which have been incorporated into each of them. >>> Each >>> of these use their own algorithm to color text. >>> >>> a) The Wikitrust project >>> >>> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page >>> >>> b) The Wiki-reputation project at Grouplens research >>> >>> http://wiki-reputation.cs.umn.edu/index.php/Main_Page >>> >>> 2) Once you have evaluated the two interfaces, kindly complete this >>> survey >>> on Wikipedia quality >>> >>> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d >>> >>> >>> We hope to get your valuable feedback on these interfaces and how >>> Wikipedia >>> article quality can be improved. >>> >>> Thanks for your time >>> >>> Avanidhar Chandrasekaran, >>> >>> GroupLens Research, University of Minnesota >>> >> >> Quite interesting - the "age of words" color coding might be useful in >> detecting obtuse type vandalism. >> >> m >> >> >> ------------------------------ >> >> Message: 4 >> Date: Wed, 19 Nov 2008 17:40:23 -0500 >> From: Joseph Reagle <reagle(a)mit.edu> >> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: <200811191740.23471.reagle(a)mit.edu> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Wednesday 19 November 2008, avani(a)cs.umn.edu wrote: >>> We hope to get your valuable feedback on these interfaces and how >>> Wikipedia >>> article quality can be improved. >> >> This might bias other respondants, but I thought it was an intersting >> idea >> so I wanted to share it. I concluded with the following which is no doubt >> affected by my being a WikiGnome: >> >> [[ >> If I see an error, I fix it without much regard to time or author >> reputation. I do pay attention to and investigate author reputation on >> substantive issues on the discussion pages and it would be interesting to >> see a discussion thread colored according to reputation. >> ]] >> >> >> >> ------------------------------ >> >> Message: 5 >> Date: Sun, 23 Nov 2008 09:03:25 -0500 >> From: "Maury Markowitz" <maury.markowitz(a)gmail.com> >> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: >> <5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0(a)mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote: >>> We hope to get your valuable feedback on these interfaces and how >>> Wikipedia >>> article quality can be improved. >> >> Given the older snapshots, I selected older articles that I had >> started, NuBUS and ARCNET. >> >> The "time based" system from UMN did not work at all, every search >> resulted in a page not found. >> >> The USCS system did work, but gave me odd results. Apparently I have a >> very bad reputation, because when I look in the History at the first >> versions, which I wrote in entirety, it colored it all yellow! >> >> Newer versions of the same articles had much more white, even though >> huge portions of the text were still from the origial. This may be due >> to diff problems -- I consider diff to be largely random in >> effectiveness, sometimes it works, but othertimes a single whitespace >> change, especially vertical, will make it think the entire article was >> edited. >> >> My guess is that the system is tripping over diffs like this, and thus >> considering the article to have been re-written by another editor. >> Since this has happened, MY reputation goes down, or so I understand >> it. >> >> I don?t think this system could possibly work if based on wiki's >> diffs. If its going to work it?s going to need to use a much more >> reliable system. >> >> Another problem I see with it is that it will rank an author who?s >> contributions are 1000 unchanged comma inserts to be as reliable as an >> author who created a perfect 1000 character article (or perhaps rate >> the first even higher). There should be some sort of length bias, if >> an author makes a big edit, out of character, that?s important to >> know. >> >> Maury >> >> >> >> ------------------------------ >> >> Message: 6 >> Date: Sun, 23 Nov 2008 09:44:40 -0500 >> From: "Gregory Maxwell" <gmaxwell(a)gmail.com> >> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: >> <e692861c0811230644i316f94abg6cafe7ef87f6bc3b(a)mail.gmail.com> >> Content-Type: text/plain; charset=UTF-8 >> >> On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz >> <maury.markowitz(a)gmail.com> wrote: >>> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote: >>>> We hope to get your valuable feedback on these interfaces and how >>>> Wikipedia >>>> article quality can be improved. >>> >>> Given the older snapshots, I selected older articles that I had >>> started, NuBUS and ARCNET. >>> >>> The "time based" system from UMN did not work at all, every search >>> resulted in a page not found. >> >> The UMN system intentionally included only a small number (70?) >> articles. This is why you needed to use the random page function to >> browse among them. >> >> This doesn't reflect any short coming of the system, but it most >> likely just reflects the limits of computational resources they were >> working under. >> >> [snip] >>> Newer versions of the same articles had much more white, even though >>> huge portions of the text were still from the origial. This may be due >>> to diff problems -- I consider diff to be largely random in >>> effectiveness, sometimes it works, but othertimes a single whitespace >>> change, especially vertical, will make it think the entire article was >>> edited. >> >> Yes, I had exactly the same experience with the USCS system: Different >> coloring for text I'd added in same edit which created the article. >> Quite inscrutable. >> >> [snip] >>> Another problem I see with it is that it will rank an author who?s >>> contributions are 1000 unchanged comma inserts to be as reliable as an >>> author who created a perfect 1000 character article (or perhaps rate >>> the first even higher). There should be some sort of length bias, if >>> an author makes a big edit, out of character, that?s important to >>> know. >> >> For the articles it covered I found the UMN system to be more usable: >> It's output was more explicable, and the signal to noise ratio was >> just better. This may be partially due to bugs in the USCS history >> analysis, and different a different choice in coloring thresholds >> (USCS seemed to color almost everything, removing the usefulness of >> color as something to draw my attention). >> >> Even so, I'm distrustful of "reputation" as an automated metric. >> Reputation is a fuzzy thing (consider your comma example), but time is >> just a straight forward metric which is much easier to get right. Your >> tireless and unreverted editing of external links tells me very little >> about your ability to make a reliable edit to the intro of an article, >> ... or at least very little that I didn't already know by merely >> knowing if your account was brand new or not. (New accounts are more >> likely to be used by inexperienced and ill-motivated persons) >> >> I believe a metric applied correctly, consistently, and understandably >> is just going to be more useful than a metric which considers more >> data but is also subject to more noise. The differential performance >> between these two systems has done nothing but confirm my suspicions >> in this regard. >> >> A simply objective challenge for any predictive coloring system would >> be to use them in the following experimental procedure: >> >> * Take a dump of Wikipedia up a year old, use this as the underlying >> knowledge for the systems. >> * Make several random selections of articles and include the newer >> revisions not included in the initial set up to 6 months old. Call >> these the test sets. >> * The predictive coloring system should then take each revision in a >> test set in time order and predict if it will be reverted (Within X >> time?). >> * The actual edits up to now should be analyzed to determined which >> changes actually were reverted and when. >> >> The final score will be the false positive and false negative rates. >> So long as e assume that the existing editing practices are not too >> bad we should find that the best predictive coloring system would >> generally tend to minimize these rates. >> >> ------------------------------ >> >> Message: 7 >> Date: Mon, 24 Nov 2008 17:22:23 -0800 >> From: "Luca de Alfaro" <luca(a)dealfaro.org> >> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: >> <28fa90930811241722y25c26bf1i6441b489e3ff6285(a)mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> I agree with Gregory that it is very useful to quantify the usefulness of >> trust information on text -- otherwise, all comparison are very >> subjective. >> In our WikiSym 08 paper, we measure various parameters of the "trust" >> coloring we compute, including: >> >> - Recall of deletions. Only 3.4% of text is in the lower half of trust >> values, yet this is 66% of the text that is deleted in the very next >> revision. >> - Precision of deletions. Text is the bottom half of trust values has >> probability 33% of being deleted in the next revision, agaist a >> probability >> of 1.9% for general text. The deletion probability raises to 62% for >> text >> in the bottom 20% of trust values. >> - We study the correlation between the trust of a word, sampled at >> random >> in all revisions, and the future lifespan of a word (correcting for the >> finite horizon effect due to the finite number of revisions in each >> article), showing positive correlation. >> >> Some aspects are not captured by the above measures: >> >> - We ensured that every "tampering" (including cut-and-paste) are >> reflected in the trust coloring, so it is hard to subvert the algorithm >> (does "age" provide this?). >> - We ensured the whole scheme is robust wrt attacks (see the various >> papers if you are interested). >> >> I fully believe that it should not be hard to improve on our system re. >> the >> above measurements. And I fully agree that the "reputation" we compute >> is >> essentially an internal parameter of the system, and does not really >> constitute a good summary of a person's overall Wikipedia contribution; >> for >> this and other reasons we do not display it. >> >> Luca >> >> A simply objective challenge for any predictive coloring system would >>> be to use them in the following experimental procedure: >>> >>> * Take a dump of Wikipedia up a year old, use this as the underlying >>> knowledge for the systems. >>> * Make several random selections of articles and include the newer >>> revisions not included in the initial set up to 6 months old. Call >>> these the test sets. >>> * The predictive coloring system should then take each revision in a >>> test set in time order and predict if it will be reverted (Within X >>> time?). >>> * The actual edits up to now should be analyzed to determined which >>> changes actually were reverted and when. >>> >>> The final score will be the false positive and false negative rates. >>> So long as e assume that the existing editing practices are not too >>> bad we should find that the best predictive coloring system would >>> generally tend to minimize these rates. >>> _______________________________________________ >>> Wikipedia-l mailing list >>> Wikipedia-l(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >>> >> >> >> ------------------------------ >> >> Message: 8 >> Date: Mon, 24 Nov 2008 17:35:13 -0800 >> From: "Luca de Alfaro" <luca(a)dealfaro.org> >> Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia >> Quality >> To: wikipedia-l(a)lists.wikimedia.org >> Message-ID: >> <28fa90930811241735l235af9cag554632448d80ef7(a)mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Maury, >> >> perhaps I can help explain the behavior you saw in the UCSC system (I am >> one >> of the developers). >> New text is always somewhat orange, to signal to visitors that it has not >> yet been fully reviewed. >> The higher the reputation, the lighter the shade of orange, but orange it >> still is (I have no idea of how high was your computed reputation when >> you >> started writing that article). >> >> Text background becomes white when other people revise it without >> drastically changing it: this indicates consensus. >> In our more recent code version, we also have a "vote" button; using >> this, >> text can more speedily gain trust without need for many revisions to >> occur. >> In a live experiment, where people can click on the vote button, I >> presume >> the trust of the text would raise more rapidly. Note that the code >> prevents >> double voting, or creating sock-puppet accounts to vote, etc etc. >> >> So I don't think based on what you say that the system is tripping over >> diffs. It is simply considering new text less trusted, and more revised >> text more trusted, which is what we wanted. It appears however we don't >> do >> a very good job on the web site describing the algorithm (I guess we put >> most of the description work in writing the papers... we will try to >> improve >> the web site). >> >> We don't measure "edit work" in number of edits, but in number of words >> changed. >> As you say, for our system, changing 1000 words in separate edits is the >> same (provided the edits are all kept, i.e., not reverted) as providing a >> single 1000-word contribution. We thought of giving a larger prize to >> larger contributions: precisely, of making the reputation increment >> proportional to n^a, where n is the number of words, and a > 1. This did >> not work well for the Wikipedia, because it ended up not rewarding enough >> the work of the many editors, who clean and polish the articles, thus >> making >> many small edits. Technically it would be trivial to change the code to >> include such a non-linear reward scheme (to adopt rewards proportional to >> n^a rather than n); whether it is desirable, I have no idea. It does not >> lead to better quantitative performance of the system, i.e., the >> resulting >> trust is not better at predicting future text deletions. >> >> Luca >> >> >>> The USCS system did work, but gave me odd results. Apparently I have a >>> very bad reputation, because when I look in the History at the first >>> versions, which I wrote in entirety, it colored it all yellow! >>> >>> Newer versions of the same articles had much more white, even though >>> huge portions of the text were still from the origial. This may be due >>> to diff problems -- I consider diff to be largely random in >>> effectiveness, sometimes it works, but othertimes a single whitespace >>> change, especially vertical, will make it think the entire article was >>> edited. >>> >>> My guess is that the system is tripping over diffs like this, and thus >>> considering the article to have been re-written by another editor. >>> Since this has happened, MY reputation goes down, or so I understand >>> it. >>> >>> I don?t think this system could possibly work if based on wiki's >>> diffs. If its going to work it?s going to need to use a much more >>> reliable system. >>> >>> Another problem I see with it is that it will rank an author who?s >>> contributions are 1000 unchanged comma inserts to be as reliable as an >>> author who created a perfect 1000 character article (or perhaps rate >>> the first even higher). There should be some sort of length bias, if >>> an author makes a big edit, out of character, that?s important to >>> know. >>> >>> Maury >>> >>> _______________________________________________ >>> Wikipedia-l mailing list >>> Wikipedia-l(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >>> >> >> >> ------------------------------ >> >> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >> >> >> End of Wikipedia-l Digest, Vol 64, Issue 3 >> ****************************************** >> > > > > > ------------------------------ > > _______________________________________________ > Wikipedia-l mailing list > Wikipedia-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > > > End of Wikipedia-l Digest, Vol 64, Issue 4 > ******************************************

15 years, 5 months

Re: [Wikipedia-l] Wikipedia-l Digest, Vol 64, Issue 3

by Jocla

susceibe ----- Original Message ----- From: <wikipedia-l-request(a)lists.wikimedia.org> To: <wikipedia-l(a)lists.wikimedia.org> Sent: Tuesday, November 25, 2008 1:35 AM Subject: Wikipedia-l Digest, Vol 64, Issue 3 > Send Wikipedia-l mailing list submissions to > wikipedia-l(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > or, via email, send a message with subject or body 'help' to > wikipedia-l-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikipedia-l-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikipedia-l digest..." > > > Today's Topics: > > 1. suscribe (Jocla) > 2. Study on Interfaces to Improving Wikipedia Quality > (avani(a)cs.umn.edu) > 3. Re: Study on Interfaces to Improving Wikipedia Quality > (michael west) > 4. Re: Study on Interfaces to Improving Wikipedia Quality > (Joseph Reagle) > 5. Re: Study on Interfaces to Improving Wikipedia Quality > (Maury Markowitz) > 6. Re: Study on Interfaces to Improving Wikipedia Quality > (Gregory Maxwell) > 7. Re: Study on Interfaces to Improving Wikipedia Quality > (Luca de Alfaro) > 8. Re: Study on Interfaces to Improving Wikipedia Quality > (Luca de Alfaro) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 19 Nov 2008 18:22:03 -0000 > From: "Jocla" <paresdoce(a)gmail.com> > Subject: [Wikipedia-l] suscribe > To: <wikipedia-l(a)lists.wikimedia.org> > Message-ID: <001c01c94a73$ba1850e0$7f01a8c0@windows337902b> > Content-Type: text/plain; charset="iso-8859-1" > > thanks for your e-mail, i would like to suscribe. > > ------------------------------ > > Message: 2 > Date: 19 Nov 2008 13:23:53 -0600 > From: avani(a)cs.umn.edu > Subject: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: <Prayer.1.0.18.0811191323530.7842(a)sabinus.cs.umn.edu> > Content-Type: text/plain; format=flowed; charset=ISO-8859-1 > > > Dear All, > > My name is Avanidhar Chandrasekaran > (http://en.wikipedia.org/wiki/User_talk:Avanidhar). > > I work with GroupLens Research at the University of Minnesota, Twin > Cities. > As part of my research, I am involved in analyzing the usefulness and > Necessity of author reputation in Wikipedia. > > In lieu of this, I have simulated an Interface to color words in an > article > based on their Age. > > Being experienced contributors to Wikipedia, I invite you to participate > in > this study, which involves the following. > > 1. Please visit the following Instances of wikipedia and evaluate the > interface components which have been incorporated into each of them. Each > of these use their own algorithm to color text. > > a) The Wikitrust project > > http://wiki-trust.cse.ucsc.edu/index.php/Main_Page > > b) The Wiki-reputation project at Grouplens research > > http://wiki-reputation.cs.umn.edu/index.php/Main_Page > > 2) Once you have evaluated the two interfaces, kindly complete this survey > on Wikipedia quality > > http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d > > > We hope to get your valuable feedback on these interfaces and how > Wikipedia > article quality can be improved. > > Thanks for your time > > Avanidhar Chandrasekaran, > > GroupLens Research, University of Minnesota > > > > > ------------------------------ > > Message: 3 > Date: Wed, 19 Nov 2008 20:01:27 +0000 > From: "michael west" <michawest(a)gmail.com> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <cfe6de600811191201h727fb4e4s9660f64f2815c93f(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > 2008/11/19 <avani(a)cs.umn.edu> > >> >> Dear All, >> >> My name is Avanidhar Chandrasekaran >> (http://en.wikipedia.org/wiki/User_talk:Avanidhar). >> >> I work with GroupLens Research at the University of Minnesota, Twin >> Cities. >> As part of my research, I am involved in analyzing the usefulness and >> Necessity of author reputation in Wikipedia. >> >> In lieu of this, I have simulated an Interface to color words in an >> article >> based on their Age. >> >> Being experienced contributors to Wikipedia, I invite you to participate >> in >> this study, which involves the following. >> >> 1. Please visit the following Instances of wikipedia and evaluate the >> interface components which have been incorporated into each of them. Each >> of these use their own algorithm to color text. >> >> a) The Wikitrust project >> >> http://wiki-trust.cse.ucsc.edu/index.php/Main_Page >> >> b) The Wiki-reputation project at Grouplens research >> >> http://wiki-reputation.cs.umn.edu/index.php/Main_Page >> >> 2) Once you have evaluated the two interfaces, kindly complete this >> survey >> on Wikipedia quality >> >> http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d >> >> >> We hope to get your valuable feedback on these interfaces and how >> Wikipedia >> article quality can be improved. >> >> Thanks for your time >> >> Avanidhar Chandrasekaran, >> >> GroupLens Research, University of Minnesota >> > > Quite interesting - the "age of words" color coding might be useful in > detecting obtuse type vandalism. > > m > > > ------------------------------ > > Message: 4 > Date: Wed, 19 Nov 2008 17:40:23 -0500 > From: Joseph Reagle <reagle(a)mit.edu> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: <200811191740.23471.reagle(a)mit.edu> > Content-Type: text/plain; charset="iso-8859-1" > > On Wednesday 19 November 2008, avani(a)cs.umn.edu wrote: >> We hope to get your valuable feedback on these interfaces and how >> Wikipedia >> article quality can be improved. > > This might bias other respondants, but I thought it was an intersting idea > so I wanted to share it. I concluded with the following which is no doubt > affected by my being a WikiGnome: > > [[ > If I see an error, I fix it without much regard to time or author > reputation. I do pay attention to and investigate author reputation on > substantive issues on the discussion pages and it would be interesting to > see a discussion thread colored according to reputation. > ]] > > > > ------------------------------ > > Message: 5 > Date: Sun, 23 Nov 2008 09:03:25 -0500 > From: "Maury Markowitz" <maury.markowitz(a)gmail.com> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <5bdbc9050811230603u5a9ca6e8ned59c4421c8eacb0(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote: >> We hope to get your valuable feedback on these interfaces and how >> Wikipedia >> article quality can be improved. > > Given the older snapshots, I selected older articles that I had > started, NuBUS and ARCNET. > > The "time based" system from UMN did not work at all, every search > resulted in a page not found. > > The USCS system did work, but gave me odd results. Apparently I have a > very bad reputation, because when I look in the History at the first > versions, which I wrote in entirety, it colored it all yellow! > > Newer versions of the same articles had much more white, even though > huge portions of the text were still from the origial. This may be due > to diff problems -- I consider diff to be largely random in > effectiveness, sometimes it works, but othertimes a single whitespace > change, especially vertical, will make it think the entire article was > edited. > > My guess is that the system is tripping over diffs like this, and thus > considering the article to have been re-written by another editor. > Since this has happened, MY reputation goes down, or so I understand > it. > > I don?t think this system could possibly work if based on wiki's > diffs. If its going to work it?s going to need to use a much more > reliable system. > > Another problem I see with it is that it will rank an author who?s > contributions are 1000 unchanged comma inserts to be as reliable as an > author who created a perfect 1000 character article (or perhaps rate > the first even higher). There should be some sort of length bias, if > an author makes a big edit, out of character, that?s important to > know. > > Maury > > > > ------------------------------ > > Message: 6 > Date: Sun, 23 Nov 2008 09:44:40 -0500 > From: "Gregory Maxwell" <gmaxwell(a)gmail.com> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <e692861c0811230644i316f94abg6cafe7ef87f6bc3b(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sun, Nov 23, 2008 at 9:03 AM, Maury Markowitz > <maury.markowitz(a)gmail.com> wrote: >> On Wed, Nov 19, 2008 at 2:23 PM, <avani(a)cs.umn.edu> wrote: >>> We hope to get your valuable feedback on these interfaces and how >>> Wikipedia >>> article quality can be improved. >> >> Given the older snapshots, I selected older articles that I had >> started, NuBUS and ARCNET. >> >> The "time based" system from UMN did not work at all, every search >> resulted in a page not found. > > The UMN system intentionally included only a small number (70?) > articles. This is why you needed to use the random page function to > browse among them. > > This doesn't reflect any short coming of the system, but it most > likely just reflects the limits of computational resources they were > working under. > > [snip] >> Newer versions of the same articles had much more white, even though >> huge portions of the text were still from the origial. This may be due >> to diff problems -- I consider diff to be largely random in >> effectiveness, sometimes it works, but othertimes a single whitespace >> change, especially vertical, will make it think the entire article was >> edited. > > Yes, I had exactly the same experience with the USCS system: Different > coloring for text I'd added in same edit which created the article. > Quite inscrutable. > > [snip] >> Another problem I see with it is that it will rank an author who?s >> contributions are 1000 unchanged comma inserts to be as reliable as an >> author who created a perfect 1000 character article (or perhaps rate >> the first even higher). There should be some sort of length bias, if >> an author makes a big edit, out of character, that?s important to >> know. > > For the articles it covered I found the UMN system to be more usable: > It's output was more explicable, and the signal to noise ratio was > just better. This may be partially due to bugs in the USCS history > analysis, and different a different choice in coloring thresholds > (USCS seemed to color almost everything, removing the usefulness of > color as something to draw my attention). > > Even so, I'm distrustful of "reputation" as an automated metric. > Reputation is a fuzzy thing (consider your comma example), but time is > just a straight forward metric which is much easier to get right. Your > tireless and unreverted editing of external links tells me very little > about your ability to make a reliable edit to the intro of an article, > ... or at least very little that I didn't already know by merely > knowing if your account was brand new or not. (New accounts are more > likely to be used by inexperienced and ill-motivated persons) > > I believe a metric applied correctly, consistently, and understandably > is just going to be more useful than a metric which considers more > data but is also subject to more noise. The differential performance > between these two systems has done nothing but confirm my suspicions > in this regard. > > A simply objective challenge for any predictive coloring system would > be to use them in the following experimental procedure: > > * Take a dump of Wikipedia up a year old, use this as the underlying > knowledge for the systems. > * Make several random selections of articles and include the newer > revisions not included in the initial set up to 6 months old. Call > these the test sets. > * The predictive coloring system should then take each revision in a > test set in time order and predict if it will be reverted (Within X > time?). > * The actual edits up to now should be analyzed to determined which > changes actually were reverted and when. > > The final score will be the false positive and false negative rates. > So long as e assume that the existing editing practices are not too > bad we should find that the best predictive coloring system would > generally tend to minimize these rates. > > ------------------------------ > > Message: 7 > Date: Mon, 24 Nov 2008 17:22:23 -0800 > From: "Luca de Alfaro" <luca(a)dealfaro.org> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <28fa90930811241722y25c26bf1i6441b489e3ff6285(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > I agree with Gregory that it is very useful to quantify the usefulness of > trust information on text -- otherwise, all comparison are very > subjective. > In our WikiSym 08 paper, we measure various parameters of the "trust" > coloring we compute, including: > > - Recall of deletions. Only 3.4% of text is in the lower half of trust > values, yet this is 66% of the text that is deleted in the very next > revision. > - Precision of deletions. Text is the bottom half of trust values has > probability 33% of being deleted in the next revision, agaist a > probability > of 1.9% for general text. The deletion probability raises to 62% for > text > in the bottom 20% of trust values. > - We study the correlation between the trust of a word, sampled at > random > in all revisions, and the future lifespan of a word (correcting for the > finite horizon effect due to the finite number of revisions in each > article), showing positive correlation. > > Some aspects are not captured by the above measures: > > - We ensured that every "tampering" (including cut-and-paste) are > reflected in the trust coloring, so it is hard to subvert the algorithm > (does "age" provide this?). > - We ensured the whole scheme is robust wrt attacks (see the various > papers if you are interested). > > I fully believe that it should not be hard to improve on our system re. > the > above measurements. And I fully agree that the "reputation" we compute is > essentially an internal parameter of the system, and does not really > constitute a good summary of a person's overall Wikipedia contribution; > for > this and other reasons we do not display it. > > Luca > > A simply objective challenge for any predictive coloring system would >> be to use them in the following experimental procedure: >> >> * Take a dump of Wikipedia up a year old, use this as the underlying >> knowledge for the systems. >> * Make several random selections of articles and include the newer >> revisions not included in the initial set up to 6 months old. Call >> these the test sets. >> * The predictive coloring system should then take each revision in a >> test set in time order and predict if it will be reverted (Within X >> time?). >> * The actual edits up to now should be analyzed to determined which >> changes actually were reverted and when. >> >> The final score will be the false positive and false negative rates. >> So long as e assume that the existing editing practices are not too >> bad we should find that the best predictive coloring system would >> generally tend to minimize these rates. >> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >> > > > ------------------------------ > > Message: 8 > Date: Mon, 24 Nov 2008 17:35:13 -0800 > From: "Luca de Alfaro" <luca(a)dealfaro.org> > Subject: Re: [Wikipedia-l] Study on Interfaces to Improving Wikipedia > Quality > To: wikipedia-l(a)lists.wikimedia.org > Message-ID: > <28fa90930811241735l235af9cag554632448d80ef7(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Maury, > > perhaps I can help explain the behavior you saw in the UCSC system (I am > one > of the developers). > New text is always somewhat orange, to signal to visitors that it has not > yet been fully reviewed. > The higher the reputation, the lighter the shade of orange, but orange it > still is (I have no idea of how high was your computed reputation when you > started writing that article). > > Text background becomes white when other people revise it without > drastically changing it: this indicates consensus. > In our more recent code version, we also have a "vote" button; using this, > text can more speedily gain trust without need for many revisions to > occur. > In a live experiment, where people can click on the vote button, I presume > the trust of the text would raise more rapidly. Note that the code > prevents > double voting, or creating sock-puppet accounts to vote, etc etc. > > So I don't think based on what you say that the system is tripping over > diffs. It is simply considering new text less trusted, and more revised > text more trusted, which is what we wanted. It appears however we don't > do > a very good job on the web site describing the algorithm (I guess we put > most of the description work in writing the papers... we will try to > improve > the web site). > > We don't measure "edit work" in number of edits, but in number of words > changed. > As you say, for our system, changing 1000 words in separate edits is the > same (provided the edits are all kept, i.e., not reverted) as providing a > single 1000-word contribution. We thought of giving a larger prize to > larger contributions: precisely, of making the reputation increment > proportional to n^a, where n is the number of words, and a > 1. This did > not work well for the Wikipedia, because it ended up not rewarding enough > the work of the many editors, who clean and polish the articles, thus > making > many small edits. Technically it would be trivial to change the code to > include such a non-linear reward scheme (to adopt rewards proportional to > n^a rather than n); whether it is desirable, I have no idea. It does not > lead to better quantitative performance of the system, i.e., the resulting > trust is not better at predicting future text deletions. > > Luca > > >> The USCS system did work, but gave me odd results. Apparently I have a >> very bad reputation, because when I look in the History at the first >> versions, which I wrote in entirety, it colored it all yellow! >> >> Newer versions of the same articles had much more white, even though >> huge portions of the text were still from the origial. This may be due >> to diff problems -- I consider diff to be largely random in >> effectiveness, sometimes it works, but othertimes a single whitespace >> change, especially vertical, will make it think the entire article was >> edited. >> >> My guess is that the system is tripping over diffs like this, and thus >> considering the article to have been re-written by another editor. >> Since this has happened, MY reputation goes down, or so I understand >> it. >> >> I don?t think this system could possibly work if based on wiki's >> diffs. If its going to work it?s going to need to use a much more >> reliable system. >> >> Another problem I see with it is that it will rank an author who?s >> contributions are 1000 unchanged comma inserts to be as reliable as an >> author who created a perfect 1000 character article (or perhaps rate >> the first even higher). There should be some sort of length bias, if >> an author makes a big edit, out of character, that?s important to >> know. >> >> Maury >> >> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >> > > > ------------------------------ > > _______________________________________________ > Wikipedia-l mailing list > Wikipedia-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > > > End of Wikipedia-l Digest, Vol 64, Issue 3 > ****************************************** >

15 years, 5 months

by Jocla

thanks for your e-mail, i would like to suscribe.

15 years, 5 months

mo.wikipedia.org when will you stop making joke of us ?

by Cetateanu Moldovanu

Hi, I'm a citizen of Republic of Moldova and I want to inform you that in our country everyone is writing Moldovan language with latin letters. When we were under soviet union occupation, they tryed to russificate us and forced to have our language written with cyrilic. In 1991, after getting the freedom to choose, we choose our language to be written with latin letters, as we did before russians conquest us (without ask the people) and divided from Romania (our mother land). Thereby, as a free moldovan speaking man, I'm asking you to remove mo.wikipedia.org (witch is in cyrillic and is very offensive for us) and respect our choice as a independent nation or to make it with latin letters. Thank you.

15 years, 5 months

Wikipedia logo work in progress

by Cary Bass

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello all, We're working on an update to the Wikipedia logo, which can be used in 3-D, which will be correcting all the incorrect glyphs, and include many other scripts that are not presently in the logo. The project page is at <http://meta.wikimedia.org/wiki/Wikipedia/Logo> and we're still looking for community members to discuss, to help sort out characters, font styles and representations for the additional alphabets as well as continue discussing the current glyphs on the talk page at <http://meta.wikimedia.org/wiki/Talk:Wikipedia/Logo>. Your input is greatly appreciated! Cary -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJGI0DyQg4JSymDYkRArwXAJsEYbANI9RrU4HWR9PXjE7Qz4IqxgCgrapl 7IiOfWJ+7jRPaioYuoBXGHg= =nwDW -----END PGP SIGNATURE-----

15 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Wikipedia-l November 2008