"Software Weighs Wikipedians' Trustworthiness"

List overview All Threads
Download

newer

older

Clarification about prevailing AFD...

Re: [WikiEN-l] Oh shit, Slashdot...

David Goodman

5 Aug 2007 5 Aug '07

8:04 a.m.

from Chronicle of Higher Education, Wired Campus blog. http://chronicle.com/wiredcampus/index.php?id=2278

"software that color-codes Wikipedia entries, identifying those portions deemed trustworthy and those that might be taken with a grain of salt.

To determine which passages make the grade, the researchers analyzed Wikipedia's editing history, tracking material that has remained on the site for a long time and edits that have been quickly overruled. A Wikipedian with a distinguished record of unchanged edits is declared trustworthy, and his or her contributions are left untouched on the Santa Cruz team's color-coded pages. But a contributor whose posts have frequently been changed or deleted is considered suspect, and his or her content is highlighted in orange. (The darker the orange, the more spurious the content is thought to be.)"

Examples at http://trust.cse.ucsc.edu/

Wikimania 2007 talk: http://trust.cse.ucsc.edu/UCSC_Wiki_Lab action=AttachFile&do=get&target=wikimania07.pdf

by Luca de Alfaro B. Thomas Adler Marco Faella Ian Pye Caitlin Sadowski

-- David Goodman, Ph.D, M.L.S.

Show replies by date

Charlotte Webb

5 Aug 5 Aug

8:24 a.m.

On 8/5/07, David Goodman dgoodmanny@gmail.com wrote:

...

To determine which passages make the grade, the researchers analyzed Wikipedia's editing history, tracking material that has remained on the site for a long time and edits that have been quickly overruled. A Wikipedian with a distinguished record of unchanged edits is declared trustworthy, and his or her contributions are left untouched on the Santa Cruz team's color-coded pages. But a contributor whose posts have frequently been changed or deleted is considered suspect, and his or her content is highlighted in orange. (The darker the orange, the more spurious the content is thought to be.)"

That's nice but for the information to be useful to us they would have to start naming names.

—C.W.

Oskar Sigvardsson

9:25 a.m.

On 8/5/07, Charlotte Webb charlottethewebb@gmail.com wrote:

...

That's nice but for the information to be useful to us they would have to start naming names.

Not necessarily. Go to

http://enwiki-trust.cse.ucsc.edu/index.php/Special:Random

to see a random page on wikipedia with background color-coded according to trust. It would be a great way to look for information that might not be accurate.

--Oskar

Jussi-Ville Heiskanen

11:40 a.m.

Having trawled throught their articles via the random page button, it very much seems to me like their idea is the classic case of something that appears cool on paper, but in practise doesn't pan out.

The articles which I perused, I couldn't (with my Mark I eyball) discern any useful difference between the text that was painted pink and the text that wasn't.

It is of course conceivable that I myself wouldn't know the difference between crap content and trustworthy content, but I seriously doubt it.

While this approach may have it's merits, I think the parameters need to be tweaked and certainly expanded substantially before any realistically significant results can be gleaned from such sifting.

Specifically I would note that a user who habitually tends after multiple commonly vandalized articles, would get a high "un-trustworthiness" rating... not ideal as a metric, so mechanically applied.

-- Jussi-Ville Heiskanen, ~ [[User:Cimon Avaro]]

Oskar Sigvardsson

2:14 p.m.

On 8/5/07, Jussi-Ville Heiskanen cimonavaro@gmail.com wrote:

...

Having trawled throught their articles via the random page button, it very much seems to me like their idea is the classic case of something that appears cool on paper, but in practise doesn't pan out.

The articles which I perused, I couldn't (with my Mark I eyball) discern any useful difference between the text that was painted pink and the text that wasn't.

It is of course conceivable that I myself wouldn't know the difference between crap content and trustworthy content, but I seriously doubt it.

While this approach may have it's merits, I think the parameters need to be tweaked and certainly expanded substantially before any realistically significant results can be gleaned from such sifting.

Specifically I would note that a user who habitually tends after multiple commonly vandalized articles, would get a high "un-trustworthiness" rating... not ideal as a metric, so mechanically applied.

I must say that I disagree. Look at the article "Chomsky normal form", for instance:

http://enwiki-trust.cse.ucsc.edu/index.php/Chomsky_normal_form

It's a short article consisting of an intro which is almost all white and a section called "Alternative Definition" which is almost all orange. This means (I'm assuming) that it was very recently added and that the person adding it does not have a history of adding stuff that gets kept very long. If I were fact-checking the article or inspecting the sources or whatnot (assuming I had the know-how) this is a very good indicator of where I should look.

I haven't looked hard enough at their methods or results to be able to judge how good this system actually is, but if they can get it to work (and I believe that it is certainly possible), then this could be a great tool.

--Oskar

Brock Weller

5:41 p.m.

Wait, no, it applies perfectly, reverting vandals wouldn't make the 'content' you add elsewhere more trustworthy, it doesnt show you know what you're talking about, it shows you care about the encyclopedia. This does not always equate with being able to recite the laws of thermodynamics.

On 8/5/07, Jussi-Ville Heiskanen cimonavaro@gmail.com wrote:

...

Having trawled throught their articles via the random page button, it very much seems to me like their idea is the classic case of something that appears cool on paper, but in practise doesn't pan out.

The articles which I perused, I couldn't (with my Mark I eyball) discern any useful difference between the text that was painted pink and the text that wasn't.

It is of course conceivable that I myself wouldn't know the difference between crap content and trustworthy content, but I seriously doubt it.

While this approach may have it's merits, I think the parameters need to be tweaked and certainly expanded substantially before any realistically significant results can be gleaned from such sifting.

Specifically I would note that a user who habitually tends after multiple commonly vandalized articles, would get a high "un-trustworthiness" rating... not ideal as a metric, so mechanically applied.

-- Jussi-Ville Heiskanen, ~ [[User:Cimon Avaro]]

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

-- -Brock

Jussi-Ville Heiskanen

7:34 p.m.

Having given the matter some additional thought; it may be that the method may have some use in identifying that part of the text that is likely entirely non-controversial...

-- Jussi-Ville Heiskanen, ~ [[User:Cimon Avaro]]

Phil Sandifer

3:15 p.m.

Well, I guess we got beaten to the punch on stable versions.

-Phil

On Aug 5, 2007, at 2:04 AM, David Goodman wrote:

...

from Chronicle of Higher Education, Wired Campus blog. http://chronicle.com/wiredcampus/index.php?id=2278

"software that color-codes Wikipedia entries, identifying those portions deemed trustworthy and those that might be taken with a grain of salt.

To determine which passages make the grade, the researchers analyzed Wikipedia's editing history, tracking material that has remained on the site for a long time and edits that have been quickly overruled. A Wikipedian with a distinguished record of unchanged edits is declared trustworthy, and his or her contributions are left untouched on the Santa Cruz team's color-coded pages. But a contributor whose posts have frequently been changed or deleted is considered suspect, and his or her content is highlighted in orange. (The darker the orange, the more spurious the content is thought to be.)"

Examples at http://trust.cse.ucsc.edu/

Wikimania 2007 talk: http://trust.cse.ucsc.edu/UCSC_Wiki_Lab action=AttachFile&do=get&target=wikimania07.pdf

by Luca de Alfaro B. Thomas Adler Marco Faella Ian Pye Caitlin Sadowski

-- David Goodman, Ph.D, M.L.S.

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

Steve Summit

4:16 p.m.

Phil Sandifer wrote:

...

Well, I guess we got beaten to the punch on stable versions.

Stable versions have always struck me as too binary. Here's an idea I had a while ago for a more fine-grained approach, for articles that "solidify" over time: [[User:Ummit/Wiki karma proposal]]

Tim Starling

6 Aug 6 Aug

7:47 a.m.

David Goodman wrote:

...

from Chronicle of Higher Education, Wired Campus blog. http://chronicle.com/wiredcampus/index.php?id=2278

"software that color-codes Wikipedia entries, identifying those portions deemed trustworthy and those that might be taken with a grain of salt.

I spoke with Luca de Alfaro at length about this feature at Wikimania. I think the technology is great, and the performance is probably good enough to include it on Wikipedia itself. He assures me he will release the source code under a free license, as soon as it's presentable. Some programming work still needs to be done to make it work incrementally rather than on the entire history of the article, but the theory for this is entirely in place.

There are two very important and separate elements to this:

1) A "blame map". Some might prefer to call it a "credit map" to be more polite. This is a data structure that lets you see who is responsible for what text. It can be updated on every edit. Having one stored in MediaWiki will enable all sorts of applications. Apparently it's old-hat, and not the subject of the present research, but it'll be great to have an implementation integrated with MediaWiki.

2) A reputation metric. This is a predictor of how long a given user's edits will stay in an article. It's novel, and it's the main topic of de Alfaro's research.

These two elements could be used independently in any way we choose.

The social implications of having a reputation metric are not lost on de Alfaro. He gives the following responses to the usual criticisms:

1. The reputation metric does not rank respected, established users -- it has a maximum value which will be routinely obtained by many people.

2. The value of the metric for individual users is obscured. The only access to it is via the reputation-coloured article text. The annotated article text has no usernames attached, and the metric is not displayed on user pages or the like.

3. It's content-based which makes it harder to game than voting-based metrics.

It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

-- Tim Starling

Jussi-Ville Heiskanen

9:41 a.m.

On 8/6/07, Tim Starling tstarling@wikimedia.org wrote:

...

A "blame map". Some might prefer to call it a "credit map" to be more

polite. This is a data structure that lets you see who is responsible for what text. It can be updated on every edit. Having one stored in MediaWiki will enable all sorts of applications. Apparently it's old-hat, and not the subject of the present research, but it'll be great to have an implementation integrated with MediaWiki.

A reputation metric. This is a predictor of how long a given user's

edits will stay in an article. It's novel, and it's the main topic of de Alfaro's research.

These two elements could be used independently in any way we choose.

...

The value of the metric for individual users is obscured. The only

access to it is via the reputation-coloured article text. The annotated article text has no usernames attached, and the metric is not displayed on user pages or the like.

Can you clarify this for me. What is there to prevent me from checking a particular snippet of text that has been edited by only one person, and seeing that it is a specific hue, and basing my evaluation of his or her trust metric on that hue?

To be obscured would it not be necessary to only colour text that has been edited by multiple editors, thus limiting the usefulness?

<snip>

...

It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

I have not yet changed back to thinking this may not be useful, but I think we need more information on the limitations and possible further improvements that can be made to it. Is there a fuller description of the underlying formula/algorithm somewhere?

-- Jussi-Ville Heiskanen, ~ [[User:Cimon Avaro]]

David Goodman

9:49 a.m.

Their Wikimania slides have an interesting example, where a really problematic edit becomes gradually accepted simply from being there and other people editing the article but not fixing it.

On 8/6/07, Jussi-Ville Heiskanen cimonavaro@gmail.com wrote:

...

On 8/6/07, Tim Starling tstarling@wikimedia.org wrote:

...

A "blame map". Some might prefer to call it a "credit map" to be more

polite. This is a data structure that lets you see who is responsible for what text. It can be updated on every edit. Having one stored in MediaWiki will enable all sorts of applications. Apparently it's old-hat, and not the subject of the present research, but it'll be great to have an implementation integrated with MediaWiki.

A reputation metric. This is a predictor of how long a given user's

edits will stay in an article. It's novel, and it's the main topic of de Alfaro's research.

These two elements could be used independently in any way we choose.

<snip for clarity>

...

The value of the metric for individual users is obscured. The only

access to it is via the reputation-coloured article text. The annotated article text has no usernames attached, and the metric is not displayed on user pages or the like.

Can you clarify this for me. What is there to prevent me from checking a particular snippet of text that has been edited by only one person, and seeing that it is a specific hue, and basing my evaluation of his or her trust metric on that hue?

To be obscured would it not be necessary to only colour text that has been edited by multiple editors, thus limiting the usefulness?

<snip>

...
It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

I have not yet changed back to thinking this may not be useful, but I think we need more information on the limitations and possible further improvements that can be made to it. Is there a fuller description of the underlying formula/algorithm somewhere?

-- Jussi-Ville Heiskanen, ~ [[User:Cimon Avaro]]

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

-- David Goodman, Ph.D, M.L.S.

John Vandenberg

3:36 p.m.

On 8/6/07, David Goodman dgoodmanny@gmail.com wrote:

...

Their Wikimania slides have an interesting example, where a really problematic edit becomes gradually accepted simply from being there and other people editing the article but not fixing it

Was the gradual change due to increasing level of trust in the text of an individual edit, or of the user in general?

To illustrate the problem with it being based on the user (U:a is a new anon and U:r is a regular fact checker):

1. U:a changes [[T1]], [[T2]] and [[T3]] 2. U:r reverts the changes to [[T1]] 3. U:a waits a while and then restores the added text to [[T1]]

The text in [[T1]] should stay orange indefinitely, while [[T2]] and [[T3]] could gradually become normal.

I have concerns about displaying this on the live version of wikipedia. Currently all text is black on white, and we are drumming home the message that it should not ever be implicitly trusted. If we introduced something like this, many readers (and contributors) will trust black on white text more than they trust red on orange; i.e. some parts of the text are become "more" trusted due a software change. Ultimately, in order for this to be appropriate on an encyclopedia, the algorithm needs to be either very accurate measure, or kept out of sight.

Also, the blame map for dab pages is often largely orange.

http://enwiki-trust.cse.ucsc.edu/index.php/CRM http://enwiki-trust.cse.ucsc.edu/index.php/Cortex http://enwiki-trust.cse.ucsc.edu/index.php/Cortez

-- John

Tim Starling

7 Aug 7 Aug

5:40 a.m.

Jussi-Ville Heiskanen wrote:

...

Can you clarify this for me. What is there to prevent me from checking a particular snippet of text that has been edited by only one person, and seeing that it is a specific hue, and basing my evaluation of his or her trust metric on that hue?

To be obscured would it not be necessary to only colour text that has been edited by multiple editors, thus limiting the usefulness?

It's obscured in the same way that edit counts are obscured, i.e. potentially unsuccessfully.

...

I have not yet changed back to thinking this may not be useful, but I think we need more information on the limitations and possible further improvements that can be made to it. Is there a fuller description of the underlying formula/algorithm somewhere?

http://www.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf

-- Tim Starling

Magnus Manske

6 Aug 6 Aug

10:34 a.m.

On 8/6/07, Tim Starling tstarling@wikimedia.org wrote:

...

It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

Well, the obvious application would be to mark edits by users whose edits tend to be removed quickly on RecentChanges and similar pages. Page history as well. IP edits could be grouped together by some block metric, maybe (call it AOL flag!:-)

A new special page generating lists of pages with "suspicious" edits. Might be useful to find long-hidden vandalism. Or not, if the metric gets better with time...

I also have some questions: What happens if a page is blanked by someone and then restored by me? Am I awarded all the "points" for all future versions of that page? Is the metric then "cleared", meaning, reset to my metric? What about partial text removal/restoration? What if I move some text around within the page?

Also, heavily changed text is not neccessarily bad. For example, wat if I hadd realy badd tyops, but would write good (information-rich, NPOV, referenced) content? People will fix my typos and grammar, and I'll get a "bad" metric, right? (One can probably filter for the occasional typo not to influence the metric, though).

The same thing, other way around: A vandal changing dates ever so slightly, will he show up? Or will a minor-typo-filter hide him behind a threshold?

About the concern that was raised in another reply to your mail: One can probably figure out who wrote a highly flagged passage of a page by going through the history of that page. But, even if someone does that, what's the point? The metric will change over time, right? So, a newbie might be flagged like this in the beginning, when (s)he is not member of the cabal yet, but that will change once (s)he becomes a productive member of the happy wikipedia family, correct? Besides, I doubt that such retro-engineered metrics would ever be a socially accepted argument on the project. I don't see a real problem here.

Magnus

K P

6:53 p.m.

On 8/6/07, Magnus Manske magnusmanske@googlemail.com wrote:

...

On 8/6/07, Tim Starling tstarling@wikimedia.org wrote:

...
It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

....

...

Also, heavily changed text is not neccessarily bad. For example, wat if I hadd realy badd tyops, but would write good (information-rich, NPOV, referenced) content? People will fix my typos and grammar, and I'll get a "bad" metric, right? (One can probably filter for the occasional typo not to influence the metric, though).

Magnus

I don't do really bad typos, but I have a rather bloated writing style, so my text is often changed, however, the content of the text is not changed, just some minor copyediting. This would essentially make me a bad editor (not just the useless troll I was this past weekend doing edits while working), and I would have just quit Wikipedia.

Yesterday, while editing the page of Nobel Prize winning physicist I came across a number of important physics articles that were poorly written stubs, one that had been on Wikipedia for 3 years with little editing--this would get a high rating, because it has not been edited in ages--in spite of glaring syntax and spelling errors and misinformation.

Chasing off competent editors and giving poor articles high ratings just because they were on obscure difficult to research subjects isn't going to make Wikipedia better. This is the sort of thing, quality ratings, that machines aren't going to really take over for humans anytime soon.

I don't mind, in general, the idea of quality ratings for editors, if they take all things into account. Clearly this machine doesn't, and will and could artificially give "accurate" ratings to things that merely were obscure.

Brock Weller

8:07 p.m.

While I get your point KP, I think either you or I may have missed the working of this. I gathered that its not how long its been there, per se, but how many edits its survived more or less intact. So those physics articles, while long standing, are as you say rarely edited, and the software would not put trust into its content.

On 8/6/07, K P kpbotany@gmail.com wrote:

...

On 8/6/07, Magnus Manske magnusmanske@googlemail.com wrote:

...
On 8/6/07, Tim Starling tstarling@wikimedia.org wrote:

...
It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de

Alfaro

...
...
proposes. Brainstorm away.

....

...
Also, heavily changed text is not neccessarily bad. For example, wat if I hadd realy badd tyops, but would write good (information-rich, NPOV, referenced) content? People will fix my typos and grammar, and I'll get a "bad" metric, right? (One can probably filter for the occasional typo not to influence the metric, though).

Magnus

I don't do really bad typos, but I have a rather bloated writing style, so my text is often changed, however, the content of the text is not changed, just some minor copyediting. This would essentially make me a bad editor (not just the useless troll I was this past weekend doing edits while working), and I would have just quit Wikipedia.

Yesterday, while editing the page of Nobel Prize winning physicist I came across a number of important physics articles that were poorly written stubs, one that had been on Wikipedia for 3 years with little editing--this would get a high rating, because it has not been edited in ages--in spite of glaring syntax and spelling errors and misinformation.

Chasing off competent editors and giving poor articles high ratings just because they were on obscure difficult to research subjects isn't going to make Wikipedia better. This is the sort of thing, quality ratings, that machines aren't going to really take over for humans anytime soon.

I don't mind, in general, the idea of quality ratings for editors, if they take all things into account. Clearly this machine doesn't, and will and could artificially give "accurate" ratings to things that merely were obscure.

KP

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

-- -Brock

K P

9 p.m.

On 8/6/07, Brock Weller brock.weller@gmail.com wrote:

...

While I get your point KP, I think either you or I may have missed the working of this. I gathered that its not how long its been there, per se, but how many edits its survived more or less intact. So those physics articles, while long standing, are as you say rarely edited, and the software would not put trust into its content.

Oh, maybe. So, if it's never edited, it hasn't survived a lot of edits more or less intact, because there were no edits to survive.

Well, I came across a sad article looking at it.

I'll have to think on it some more, because I just don't think that's an indicator of accurate content.

David Goodman

7 Aug 7 Aug

2:03 a.m.

Personally, I did not call attention to this as anything more than an interesting experiment. In particular, the idea of actually adopting it on WP seems way premature, and I am sure the authors would agree. Nor would I ever refer to it in critiquing an article on a talk page or elsewhere in WP. At the very most, it's an experimental way of calling attention to material that might deserve a second look.

On 8/6/07, K P kpbotany@gmail.com wrote:

...

On 8/6/07, Brock Weller brock.weller@gmail.com wrote:

...
While I get your point KP, I think either you or I may have missed the working of this. I gathered that its not how long its been there, per se, but how many edits its survived more or less intact. So those physics articles, while long standing, are as you say rarely edited, and the software would not put trust into its content.

Oh, maybe. So, if it's never edited, it hasn't survived a lot of edits more or less intact, because there were no edits to survive.

Well, I came across a sad article looking at it.

I'll have to think on it some more, because I just don't think that's an indicator of accurate content.

KP

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

-- David Goodman, Ph.D, M.L.S.

Gwern Branwen

6 Aug 6 Aug

7:43 p.m.

On 0, Tim Starling tstarling@wikimedia.org scribbled:

...

David Goodman wrote:

...
from Chronicle of Higher Education, Wired Campus blog. http://chronicle.com/wiredcampus/index.php?id=2278

"software that color-codes Wikipedia entries, identifying those portions deemed trustworthy and those that might be taken with a grain of salt.

I spoke with Luca de Alfaro at length about this feature at Wikimania. I think the technology is great, and the performance is probably good enough to include it on Wikipedia itself. He assures me he will release the source code under a free license, as soon as it's presentable. Some programming work still needs to be done to make it work incrementally rather than on the entire history of the article, but the theory for this is entirely in place.

There are two very important and separate elements to this:

A "blame map". Some might prefer to call it a "credit map" to be more

polite. This is a data structure that lets you see who is responsible for what text. It can be updated on every edit. Having one stored in MediaWiki will enable all sorts of applications. Apparently it's old-hat, and not the subject of the present research, but it'll be great to have an implementation integrated with MediaWiki.

A reputation metric. This is a predictor of how long a given user's

edits will stay in an article. It's novel, and it's the main topic of de Alfaro's research.

These two elements could be used independently in any way we choose.

The social implications of having a reputation metric are not lost on de Alfaro. He gives the following responses to the usual criticisms:

The reputation metric does not rank respected, established users --

it has a maximum value which will be routinely obtained by many people.

The value of the metric for individual users is obscured. The only

access to it is via the reputation-coloured article text. The annotated article text has no usernames attached, and the metric is not displayed on user pages or the like.

It's content-based which makes it harder to game than voting-based

metrics.

It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

-- Tim Starling

The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles! #We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits. #Bots could probably benefit from this. An example: Pywikipedia's followlive.py script follows Newpages looking for dubious articles to display for the user to take action on. You could filter out all pages consisting of avg. reputation > n, or something. #People have long suggested that edits by anons and new users be buffered for a while or approved; this might be a way of doing it.

-- gwern MF Reflection NATOA Indigo AIEWS Weekly sorot Sex import Zen

cohesion

4:23 p.m.

I hope people don't want to include this on the default view of Wikipedia, surely. Having said that, I think it might be useful as a preference or something. I think we all need to accept though that the "reputation index" of all users will be findable, and published, probably in a userbox.

Does this only calculate your reputation only on article pages? It should. For example, people with huge debates on [[WP:TFD]], or AFD, Wikipedia:Talk, etc that get archived by a bot will be very bad editors indeed! :)

Judson [[:en:User:Cohesion]]

Casey Brown

5:22 p.m.

It should also keep in mind what the summary the user is providing... like if someone were to do something very radical per OTRS, or do an urgent blank. These times of things arenecessary but may give someone an unnecessary or bad rating.

On 8/6/07, cohesion cohesion@sleepyhead.org wrote:

...

I hope people don't want to include this on the default view of Wikipedia, surely. Having said that, I think it might be useful as a preference or something. I think we all need to accept though that the "reputation index" of all users will be findable, and published, probably in a userbox.

Does this only calculate your reputation only on article pages? It should. For example, people with huge debates on [[WP:TFD]], or AFD, Wikipedia:Talk, etc that get archived by a bot will be very bad editors indeed! :)

Judson [[:en:User:Cohesion]]

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

-- Casey Brown Cbrown1023 --- Note: This e-mail address is used for mailing lists. Personal emails sent to this address will probably get lost.

Gwern Branwen

8 Aug 8 Aug

7:12 p.m.

On 0, Casey Brown cbrown1023.ml@gmail.com scribbled:

...

It should also keep in mind what the summary the user is providing... like if someone were to do something very radical per OTRS, or do an urgent blank. These times of things are necessary but may give someone an unnecessary or bad rating.

Certainly that'd be a concern, but I'm reading through the linked paper, and they seem to have structured it to analyze text in two ways; one measure for the actual text itself, and another for the organization of the text.

But things seem to work out and avoid that problem:

"The fact that so few of the short-lived edits performed by high-reputation authors were judged to be of bad quality points to the fact that edits can be undone for reasons unrelated to quality. Many Wikipedia articles deal with current events; edits to those articles are undone regularly, even though they may be of good quality. Our algorithms do not treat in any special way current-events pages. Other Wikipedia edits are administrative in nature, tagging pages that need work or formatting; when these tags are removed, we classify it as text deletion..."

(My take is that the algorithm does penalize a user slightly for those listed actions, but nowhere near enough to really count, and that moving text between articles looks like simultaneous deletion and addition, so the plus and minuses would cancel out.)

-- gwern monarchist SGC 127 NAVELEXSYSSECENGCEN Z7 CACI POCSAG Ti cybercash Infrastructure

David Gerard

6 Aug 6 Aug

5:20 p.m.

On 06/08/07, Gwern Branwen gwern0@gmail.com wrote:

...

On 0, Tim Starling tstarling@wikimedia.org scribbled:

...

...
It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

...

The most exciting ones I can think of: #We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles! #We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits. #Bots could probably benefit from this. An example: Pywikipedia's followlive.py script follows Newpages looking for dubious articles to display for the user to take action on. You could filter out all pages consisting of avg. reputation > n, or something. #People have long suggested that edits by anons and new users be buffered for a while or approved; this might be a way of doing it.

I fear the only thing that comes to my mind is "what a fabulous new set of playground equipment for trolls!"

The point of internet trolling being, after all, to score points over people by working their social system against them for prank value.

At least those goldfarming for RFA might actually write something. That would be a nice change.

- d.

Andrew Gray

5:33 p.m.

On 06/08/07, Gwern Branwen gwern0@gmail.com wrote:

...

The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles! #We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits.

These two, and a few others, get into the problem that - as currently implemented - the value of the metric is concealed. In order to use a lot of these implementations we;d have to make a concious decision to publicise that value, which just gives something new to game.

The first could be done without publicising the value, but it does lead to two negative effects:

a) it's possible for someone to "go down" a grade in our trust system, which isn't currently possible and has interesting implications

b) people don't know where they stand, and we can't tell them where they stand or exactly how to improve. It's a complex model - as things stand now, we can just say "wait a few days", and even when it was the irregular newest-1% we could still say "oh, three or four days, should be okay". However, it's going to muddy the waters a lot if we have to say "make some substantive contributions and hope the computer likes the look of you"...

-- - Andrew Gray andrew.gray@dunelm.org.uk

Gwern Branwen

8 Aug 8 Aug

7:13 p.m.

On 0, Andrew Gray shimgray@gmail.com scribbled:

...

On 06/08/07, Gwern Branwen gwern0@gmail.com wrote:

...
The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles! #We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits.

These two, and a few others, get into the problem that - as currently implemented - the value of the metric is concealed. In order to use a lot of these implementations we;d have to make a conscious decision to publicise that value, which just gives something new to game.

As I think another has pointed out, if it is visible at all, the value is there. Leaving aside that it could all be done outside of WP by a bot author - spider a user's contrib page, grab each page's revision history, and run the algorithm on it. It'd be slower than if everything were being done on the WMF servers, and would probably rule out using it in interactive applications, but you could still do it - even if the only indication were some HTML tags indicating font colors, one could still extract the information. Perhaps blue = high reputation, or whatever.

...

The first could be done without publicising the value, but it does lead to two negative effects:

a) it's possible for someone to "go down" a grade in our trust system, which isn't currently possible and has interesting implications

It formalizes in a sense what we already do as a community, I think. Community-banned users are the lowest grade in our trust system, anonymous AOL and school IPs a step up, random residential anon IPs still another step, oft-blocked users a little higher (obviously there's overlap in these sets - if I said they were exclusive, no doubt someone would raise the example of that anon IP with multiple RfA offers!), and so on and so forth. Some of these levels are explicit, like with blocking and semi-protection, and others are not, like the increased scrutiny users with a red userpage link apparently are favored with.

...

b) people don't know where they stand, and we can't tell them where they stand or exactly how to improve. It's a complex model - as things stand now, we can just say "wait a few days", and even when it was the irregular newest-1% we could still say "oh, three or four days, should be okay". However, it's going to muddy the waters a lot if we have to say "make some substantive contributions and hope the computer likes the look of you"...

--

Andrew Gray

I think the harder to predict nature could be turned into an argument for it; I understand the FA people have trouble with the article of the day even when semi-protected because trolls register sleeper accounts and just wait for a boring day to go vandalize. It's a very low cognitive burden to simply create some random accounts and forget about them until you want to vandalize; if they are forced to invest in them a little by improving articles, then the burden rises. For good people, this isn't so much of an issue: presumably a good person registers an account precisely to do what this would encourage them to do.

-- gwern monarchist SGC 127 NAVELEXSYSSECENGCEN Z7 CACI POCSAG Ti cybercash Infrastructure

Bryan Derksen

6 Aug 6 Aug

7:57 p.m.

Gwern Branwen wrote:

...

The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles!

On the one hand, as an inclusionist and apostate mergist, I would welcome anything codified that boosted the philosophy that "more articles is good". On the other hand, though, forcing people to run the Newpages Patrol gauntlet in order to edit other existing articles may not be optimal. It can be frustrating seeing one's work randomly blipping out of existence minutes after saving it, I wouldn't want that experience explicitly forced on new users. And some editors just don't _want_ to create new articles; they prefer editing existing ones. That's perfectly useful too.

...

#We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits.

I _wish_ editcountitis counted for more, I'd be a sort of demigod. :)

I'm not sure why a reputation metric of any sort is necessary, though. The contributions to articles themselves should stand on their own; one of the main defenses I gave to the press during the Essjay controversy was that we don't usually consider the reputation or qualifications of the editors relevant when evaluating their work. And if an editor has a history of significant disruptiveness, inaccuracy, etc., then it'll be raised on their user talk page and perhaps ultimately proceed on to RfC and other such fora. We don't need robots and math formulae to do that.

...

#Bots could probably benefit from this. An example: Pywikipedia's followlive.py script follows Newpages looking for dubious articles to display for the user to take action on. You could filter out all pages consisting of avg. reputation > n, or something.

Could work, but there's no need to display the orange highlighting for this one.

...

#People have long suggested that edits by anons and new users be buffered for a while or approved; this might be a way of doing it.

Also might work, but version flagging is so close to being real now that I'd like to see how that goes first. Baby steps. :)

K P

8:02 p.m.

On 8/6/07, Bryan Derksen bryan.derksen@shaw.ca wrote:

...

Gwern Branwen wrote:

...
The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles!

I like what one write put on the blog:

"It's hard to imagine a methodology less related to the verifiable accuracy of the articles."

Again, because it fails to serve its supposedly intended, or halfway intended, or pseudo intended purposes, it's just clutter.

Gwern Branwen

8 Aug 8 Aug

7:14 p.m.

On 0, Bryan Derksen bryan.derksen@shaw.ca scribbled:

...

Gwern Branwen wrote:

...
The most exciting ones I can think of:

#We can scrap the 'newest 1%' part of semi-protection. Instead of waiting 4 days, write 4 articles!

On the one hand, as an inclusionist and apostate mergist, I would welcome anything codified that boosted the philosophy that "more articles is good". On the other hand, though, forcing people to run the Newpages Patrol gauntlet in order to edit other existing articles may not be optimal. It can be frustrating seeing one's work randomly blipping out of existence minutes after saving it, I wouldn't want that experience explicitly forced on new users. And some editors just don't _want_ to create new articles; they prefer editing existing ones. That's perfectly useful too.

Well, I was being facetious there. For symmetry, I had to keep the '4' from semi-protection, but I didn't want to say edits because we all know how pointless and worthless any given edit can be; '4 articles' had a properly substantial sound to it. Plus, the way I say 'articles', the two clauses had the same syllable count, which pleases me.

...

...
#We can scrap editcountitis - this reputation metric may still not be ideal, but I suspect the metric will reflect the value of one's contributions *a heckuva* lot better than # of edits.

I _wish_ editcountitis counted for more, I'd be a sort of demigod. :)

I'm not sure why a reputation metric of any sort is necessary, though. The contributions to articles themselves should stand on their own; one of the main defenses I gave to the press during the Essjay controversy was that we don't usually consider the reputation or qualifications of the editors relevant when evaluating their work. And if an editor has a history of significant disruptiveness, inaccuracy, etc., then it'll be raised on their user talk page and perhaps ultimately proceed on to RfC and other such fora. We don't need robots and math formulae to do that.

No, we don't *need* to use such things, in the same sense that one could go around disambiguating links and taggin' FU images without the benefit of automation, and quite a few people have spent quite a bit of time doing just that. But darn it, if you want to get a couple thousand links disambiguated or images handled so as to put a good dent in the backlog, why shouldn't you? There's more then enough un-automate-able work on WP to keep a legion of editors busy for decades; no point in passing up any tool whose benefit outweighs its cost.

...

...
#Bots could probably benefit from this. An example: Pywikipedia's followlive.py script follows Newpages looking for dubious articles to display for the user to take action on. You could filter out all pages consisting of avg. reputation > n, or something.

Could work, but there's no need to display the orange highlighting for this one.

Yeah. The entire thing could be done locally if people are willing to code it up, but having it done on the servers (even if nothing is actually displayed but just made available in a Special: page, perhaps) is a lot more efficient in terms of coding, time, and bandwidth. (Reading through the paper, for a given section of text in an article, you need all past revisions of that page n, and all past revisions of all articles edited by all editors of that page n).

...

...
#People have long suggested that edits by anons and new users be buffered for a while or approved; this might be a way of doing it.

Also might work, but version flagging is so close to being real now that I'd like to see how that goes first. Baby steps. :)

Yes, let's put it on the docket. I'm sure stable versions and SUL will be done very soon, and then it'd be easy as pie to add this reputation stuff in. Next year in Jerusalem!

-- gwern monarchist SGC 127 NAVELEXSYSSECENGCEN Z7 CACI POCSAG Ti cybercash Infrastructure

Parker Conrad

6 Aug 6 Aug

8:12 p.m.

For those who are interested in similar systems being used to calculate "reputation", check out our implementation at www.wikinvest.com -- It's different than what these guys have proposed but it's similar in that it relies on the amount of content created & probability that an individual user's changes will be undone to "score" contributors. We use it to identify "top contributors" to articles, and also to give rankings to users across the site. Think of it this way -- if I make a change to a wiki, and then someone else edits on top of me but leaves my contributions in place, that person has implicitly "approved" of what I wrote, and "voted" for me as a contributor. It drastically reduces the need to patrol changes -- trusted contributors can be identified, and scrutiny reserved for those with little to no reputation.

/prc

On 8/5/07, Tim Starling tstarling@wikimedia.org wrote:

...

David Goodman wrote:

...
from Chronicle of Higher Education, Wired Campus blog. http://chronicle.com/wiredcampus/index.php?id=2278

"software that color-codes Wikipedia entries, identifying those portions deemed trustworthy and those that might be taken with a grain of salt.

I spoke with Luca de Alfaro at length about this feature at Wikimania. I think the technology is great, and the performance is probably good enough to include it on Wikipedia itself. He assures me he will release the source code under a free license, as soon as it's presentable. Some programming work still needs to be done to make it work incrementally rather than on the entire history of the article, but the theory for this is entirely in place.

There are two very important and separate elements to this:

A "blame map". Some might prefer to call it a "credit map" to be more

polite. This is a data structure that lets you see who is responsible for what text. It can be updated on every edit. Having one stored in MediaWiki will enable all sorts of applications. Apparently it's old-hat, and not the subject of the present research, but it'll be great to have an implementation integrated with MediaWiki.

A reputation metric. This is a predictor of how long a given user's

edits will stay in an article. It's novel, and it's the main topic of de Alfaro's research.

These two elements could be used independently in any way we choose.

The social implications of having a reputation metric are not lost on de Alfaro. He gives the following responses to the usual criticisms:

The reputation metric does not rank respected, established users --

it has a maximum value which will be routinely obtained by many people.

The value of the metric for individual users is obscured. The only

access to it is via the reputation-coloured article text. The annotated article text has no usernames attached, and the metric is not displayed on user pages or the like.

It's content-based which makes it harder to game than voting-based

metrics.

It's time for us to think about how we want to use this technology. There are lots of possibilities beyond the precise design that de Alfaro proposes. Brainstorm away.

-- Tim Starling

WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l

6319

Age (days ago)

6322

Last active (days ago)

wikien-l@lists.wikimedia.org

29 comments

18 participants

tags (0)

participants (18)

Andrew Gray
Brock Weller
Bryan Derksen
Casey Brown
Charlotte Webb
cohesion
David Gerard
David Goodman
Gwern Branwen
John Vandenberg
Jussi-Ville Heiskanen
K P
Magnus Manske
Oskar Sigvardsson
Parker Conrad
Phil Sandifer
Steve Summit
Tim Starling