Hi all
Most of you probably have heard of WikiTrust [1], a tool that colors parts of MediaWiki pages based upon a calculated trust value. The demo [2] is quite impressive. I think this would especially help us to spot "subtle" vandalism more easily.
But WikiTrust could also solve another problem that has been coming time and time again, and has been discussed again recently in the German community: how to determine the main authors of an article, and how to find out who put a specific statement into an article. Tracking and assessing authorship is something many people are interested in, and I think I can speak for a lot of people in saying that we would really love to have that on the German language Wikipedia. It would be particularly helpful for print version, the method currently used by PediaPress is more than doubtful, and is getting ripped apart on the Verein's mailing list currently.
WikiTrust is getting more and more mature, and Luca de Alfaro and his team have been working hard on making it a lot more efficient. The one thing that still worries me is the fact that it would require quite a bit of storage space. Anyway, Luca really wants to integrate it into Wikipedia and other WMF wikis -- and so do I. I think that, besides being a useful tool to the community, it could also boost our credibility in academia, because authorship becomes much more transparent. Compare what WikiGenes [3] does [4]. I want that for Wikipedia. Not for making authors more prominent, but making authorship more transparent.
So, what would it take? Where could we try it? what are the concerns?
-- Daniel
PS: I can try to supply some technical details if required, I hope Luca will save me from getting stuff wrong :)
[1] http://trust.cse.ucsc.edu/ [2] http://wiki-trust.cse.ucsc.edu/index.php/Main_Page [3] http://www.wikigenes.org/ [4] http://www.mememoir.org/
On Sat, Oct 18, 2008 at 2:57 PM, Daniel Kinzler daniel@brightbyte.de wrote: ...
The one thing that still worries me is the fact that it would require quite a bit of storage space.
Maybe you can automark pages for credibility based on some subject like ( is the page a stub?, how old/how much edits/... etc.. ).. before printing, and delete that pages,
Tracking and assessing authorship is something many people are interested in, and I think I can speak for a lot of people in saying that we would really love to have that on the German language Wikipedia. It would be particularly helpful for print version, the method currently used by PediaPress is more than doubtful, and is getting ripped apart on the Verein's mailing list currently.
I can't help you here *scratch head* Anyway the wikipedia is a wiki, Is designed to make anonymous edits easy so everyone could edit. The other option, is a different type of pedia, a expert-pedia where only credited academia experts could add his opinions.
That was ...Nupedia? http://en.wikipedia.org/wiki/Nupedia&& http://nupedia.8media.org/ (ooops .... 404)
Most of you probably have heard of WikiTrust [1], a tool that colors parts of MediaWiki pages based upon a calculated trust value. The demo [2] is quite impressive. I think this would especially help us to spot "subtle" vandalism more easily.
Nifty tools :-)
Tei schrieb:
On Sat, Oct 18, 2008 at 2:57 PM, Daniel Kinzler daniel@brightbyte.de wrote: ...
The one thing that still worries me is the fact that it would require quite a bit of storage space.
Maybe you can automark pages for credibility based on some subject like ( is the page a stub?, how old/how much edits/... etc.. ).. before printing, and delete that pages,
This is not about printing. WikiTrust determins the trust level for evey *word* of every page on the Wiki. To do this, more storage space is required.
Tracking and assessing authorship is something many people are interested in, and I think I can speak for a lot of people in saying that we would really love to have that on the German language Wikipedia. It would be particularly helpful for print version, the method currently used by PediaPress is more than doubtful, and is getting ripped apart on the Verein's mailing list currently.
I can't help you here *scratch head* Anyway the wikipedia is a wiki, Is designed to make anonymous edits easy so everyone could edit. The other option, is a different type of pedia, a expert-pedia where only credited academia experts could add his opinions.
Anonymous edits are not a problem. The problem is that the GFDL requires me to credit at least the 5 "main" authors, so the question is, how to determine them. Similarly, academic citing practices call for the 3 main authors. WikiTrust would allow us to easily determine who has contributed how much to a given version of a page. Which would be quite useful.
-- daniel
On Sat, Oct 18, 2008 at 10:28 PM, Daniel Kinzler daniel@brightbyte.de wrote:
Tei schrieb:
...
Anonymous edits are not a problem. The problem is that the GFDL requires me to credit at least the 5 "main" authors, so the question is, how to determine them. Similarly, academic citing practices call for the 3 main authors. WikiTrust would allow us to easily determine who has contributed how much to a given version of a page. Which would be quite useful.
Seems there exist already tools to list contributors to a page.
Creators of Dios: http://toolserver.org/~escaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es http://toolserver.org/~daniel/WikiSense/Contributors.php?wikifam=.wikipedia....
It will be normal for a wiki page to have 80 authors (thats a fact). If you want to chose only 3, you have to ignoring some authors for some subjective bias, like... strlen(concat(modifications)) , COUNT(edits),... o using WikiGenes (I guest, wikigenes work almost like that "Blame" feature of a CVS system). I feel like you will be lying to support some external limitation :/ Who is the author of http://en.wikipedia.org/wiki/One_Thousand_and_One_Nights&& ? Maybe you sould ask for a exception on the GFDL, make so the authors of GFDL make a new version of the license that support how a wiki work, to avoid report 3 authors for a text that (by fact) has 80 authors.
Hoi, The GFDL is intended for the documentation of software.. The WMF is finding a route towards a more appropriate license.. Getting an exception to make our life more easy is not really realistic I would say. The current practice is that people refer to the Wikipedia article and this is where you find all the authors.. a really pragmatic approach to something that would otherwise be unwieldy and hinder the freedom of using this material. Thanks, GerardM
On Sun, Oct 19, 2008 at 10:04 AM, Tei oscar.vives@gmail.com wrote:
On Sat, Oct 18, 2008 at 10:28 PM, Daniel Kinzler daniel@brightbyte.de wrote:
Tei schrieb:
...
Anonymous edits are not a problem. The problem is that the GFDL requires
me to
credit at least the 5 "main" authors, so the question is, how to
determine them.
Similarly, academic citing practices call for the 3 main authors.
WikiTrust
would allow us to easily determine who has contributed how much to a
given
version of a page. Which would be quite useful.
Seems there exist already tools to list contributors to a page.
Creators of Dios: http://toolserver.org/~escaladix/cgi-bin/auteurs.tcl?title=Dios&lang=eshttp://toolserver.org/%7Eescaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es
http://toolserver.org/~daniel/WikiSense/Contributors.php?wikifam=.wikipedia....http://toolserver.org/%7Edaniel/WikiSense/Contributors.php?wikifam=.wikipedia.org&wikilang=es&page=Dios&max=200&grouped=on&order=first_edit
It will be normal for a wiki page to have 80 authors (thats a fact). If you want to chose only 3, you have to ignoring some authors for some subjective bias, like... strlen(concat(modifications)) , COUNT(edits),... o using WikiGenes (I guest, wikigenes work almost like that "Blame" feature of a CVS system). I feel like you will be lying to support some external limitation :/ Who is the author of http://en.wikipedia.org/wiki/One_Thousand_and_One_Nights&& ? Maybe you sould ask for a exception on the GFDL, make so the authors of GFDL make a new version of the license that support how a wiki work, to avoid report 3 authors for a text that (by fact) has 80 authors.
--
ℱin del ℳensaje. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Gerard Meijssen schrieb:
Hoi, The GFDL is intended for the documentation of software.. The WMF is finding a route towards a more appropriate license.. Getting an exception to make our life more easy is not really realistic I would say. The current practice is that people refer to the Wikipedia article and this is where you find all the authors.. a really pragmatic approach to something that would otherwise be unwieldy and hinder the freedom of using this material. Thanks, GerardM
For online re-use, I'd say that is OK. Not in print, however. And Wikipedians appear to feel the same way. There's a hell of a brouhaha about the ways Bertelsman handled attribution in their "best of Wikipedia" book.
-- daniel
Tei wrote:
Seems there exist already tools to list contributors to a page.
Creators of Dios: http://toolserver.org/~escaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es http://toolserver.org/~daniel/WikiSense/Contributors.php?wikifam=.wikipedia....
I know, I wrote the second one.
It will be normal for a wiki page to have 80 authors (thats a fact). If you want to chose only 3, you have to ignoring some authors for some subjective bias, like... strlen(concat(modifications)) , COUNT(edits),... o using WikiGenes (I guest, wikigenes work almost like that "Blame" feature of a CVS system). I feel like you will be lying to support some external limitation :/
Metrics like number of edits, or difference in size, etc, are trivial and useless.
The metric that makes most sense to me is "number of words contributed to the current version". To get that number, you have to track text contributed by each edit across all following edits, considering reverts, moving paragraphs, etc -- like blame, but a bit more advanced even. This is a complex task -- and it's exactly what WikiTrust does. Which is why I'm writing about it.
Who is the author of http://en.wikipedia.org/wiki/One_Thousand_and_One_Nights&& ? Maybe you sould ask for a exception on the GFDL, make so the authors of GFDL make a new version of the license that support how a wiki work, to avoid report 3 authors for a text that (by fact) has 80 authors.
This is not an option for PediaPress, which is a service that lets users pick a set of pages from Wikibooks (and soon also Wikipedia) and make a print version from that. You can of course always list all authors, but even then, you may want to rank them by the amount they contributed. And if you are able to do that, the GFDL allows you to only name the top 5, wich makes thinkgs a bit less confusing, especially in print.
Anyway, you seem to miss the point. I'm not looking for ways to track authorship, I already know the solution. I want to discuss the technical aspects of implementing it on Wikimedia servers.
-- daniel
Daniel Kinzler wrote:
Hi all
Most of you probably have heard of WikiTrust [1], a tool that colors parts of MediaWiki pages based upon a calculated trust value. The demo [2] is quite impressive. I think this would especially help us to spot "subtle" vandalism more easily.
But WikiTrust could also solve another problem that has been coming time and time again, and has been discussed again recently in the German community: how to determine the main authors of an article, and how to find out who put a specific statement into an article.
de Alfaro deliberately left that feature out of the demo that he showed me in 2007, I don't know if it's been added since. I'd rather see an annotation feature showing author names than reputation colouring. The reputation metric is the novel part of de Alfaro's work, hence his emphasis on it. But I think author annotation is a more serious and useful application for the software.
Someone might have to write a user interface for it.
-- Tim Starling
Tim Starling schrieb:
Daniel Kinzler wrote:
...
But WikiTrust could also solve another problem that has been coming time and time again, and has been discussed again recently in the German community: how to determine the main authors of an article, and how to find out who put a specific statement into an article.
de Alfaro deliberately left that feature out of the demo that he showed me in 2007, I don't know if it's been added since. I'd rather see an annotation feature showing author names than reputation colouring. The reputation metric is the novel part of de Alfaro's work, hence his emphasis on it. But I think author annotation is a more serious and useful application for the software.
Someone might have to write a user interface for it.
I'm in close contact with de Alfaro (met him at WikiSym), and told him that the authorship aspect is the feature most wanted by Wikipedias. He promised to fast-track implementation, and said he'd be working at it himself this weekend. He *really* wants to get this out there. So i'm confident :)
So, again: what would have to be done to get this live? Do you think it would be best to first run it on a not-so-big Wikipedia (maybe we should ask NL)? How soon could we try it on a test or lab wiki? What'S the procedure, who needs to approve?
-- daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Kinzler wrote:
So, again: what would have to be done to get this live? Do you think it would be best to first run it on a not-so-big Wikipedia (maybe we should ask NL)? How soon could we try it on a test or lab wiki? What'S the procedure, who needs to approve?
I chatted a little with Luca a while ago about deployment requirements; basically we need to make sure the software architecture can be set up and run relatively hands-off, and in a way that won't impact primary operations much.
Hopefully we'll continue working out such details and start getting some demos up!
- -- brion
Brion Vibber schrieb:
I chatted a little with Luca a while ago about deployment requirements; basically we need to make sure the software architecture can be set up and run relatively hands-off, and in a way that won't impact primary operations much.
Hopefully we'll continue working out such details and start getting some demos up!
Indeed :) I'm glad you are also interested in getting this out. I'm trying to keep this project in people's minds. I hope we will have a demo that includes authorship highlighting -- I want to showcase that to the dewp people, and hopefully get some pressure behind the project.
-- daniel
On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
So, what would it take? Where could we try it? what are the concerns?
FWIW, copying my email to M. Schneider: IIRC, on Wikimania you talked about the problem of how to identify primary authors of articles, so I wanted to share my thoughts on this.
The obvious first step is to go through all the revisions and get MD5 of each; then, use MD5s to isolate and disregard edits that have been reverted.
To measure difference between two edits, I mentioned you that wdiff ( http://www.gnu.org/software/wdiff/ ) could be used: simply count number of changed words in the article. Wdiff could give false positives (an author that merely switches two paragraphs will appear to be a major author), but could not give false negatives (an author who changes a single word really did just change a single word; of course, such a change may be very important, but isn't major, or, IMO, copyrightable).
More sophisticated diffs could also be introduced. For example, it would be relatively simple to make a program that tries to find if an author has switched two (or more) paragraphs, then apply a diff program as if they haven't been switched.
Finally, disregard bots, as they can claim no copyright :) (More realistically, this should be checked on a per-bot basis.)
On Tue, Oct 21, 2008 at 12:33 AM, Nikola Smolenski smolensk@eunet.yu wrote:
On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
So, what would it take? Where could we try it? what are the concerns?
FWIW, copying my email to M. Schneider:
IIRC, on Wikimania you talked about the problem of how to identify primary authors of articles, so I wanted to share my thoughts on this.
The obvious first step is to go through all the revisions and get MD5 of each; then, use MD5s to isolate and disregard edits that have been reverted.
To measure difference between two edits, I mentioned you that wdiff ( http://www.gnu.org/software/wdiff/ ) could be used: simply count number of changed words in the article. Wdiff could give false positives (an author that merely switches two paragraphs will appear to be a major author), but could not give false negatives (an author who changes a single word really did just change a single word; of course, such a change may be very important, but isn't major, or, IMO, copyrightable).
More sophisticated diffs could also be introduced. For example, it would be relatively simple to make a program that tries to find if an author has switched two (or more) paragraphs, then apply a diff program as if they haven't been switched.
or totally disregard order cat article | sed -e 's/( |\t)/\n/g' | sort
On Tuesday 21 October 2008 08:59:06 Tei wrote:
On Tue, Oct 21, 2008 at 12:33 AM, Nikola Smolenski smolensk@eunet.yu
wrote:
On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
So, what would it take? Where could we try it? what are the concerns?
To measure difference between two edits, I mentioned you that wdiff ( http://www.gnu.org/software/wdiff/ ) could be used: simply count number of changed words in the article. Wdiff could give false positives (an author that merely switches two paragraphs will appear to be a major author), but could not give false negatives (an author who changes a single word really did just change a single word; of course, such a change may be very important, but isn't major, or, IMO, copyrightable).
More sophisticated diffs could also be introduced. For example, it would be relatively simple to make a program that tries to find if an author has switched two (or more) paragraphs, then apply a diff program as if they haven't been switched.
or totally disregard order cat article | sed -e 's/( |\t)/\n/g' | sort
That's an excellent idea! It loses some things, but for measuring size of a change it's simple and it works.
wikitech-l@lists.wikimedia.org