Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole English Wikipedia, as of its February 6, 2007 snapshot, colored according to text trust. This is the first time that even we can look at how the "trust coloring" looks on the whole of the Wikipedia! We would be very interested in feedback (the wikiquality-l@lists.wikimedia.org mailing list is the best place).
If you find bugs, you can email us at http://groups.google.com/group/wiki-trust
Happy Holidays!
Luca
PS: yes, we know, some images look off. It is currently fairly difficult for a site outside of the Wikipedia to fetch Wikipedia images correctly.
PPS: there are going to be a few planned power outages on our campus in the next days, so if the demo is off, try again later.
On Dec 19, 2007 4:36 PM, Luca de Alfaro luca@soe.ucsc.edu wrote:
Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole English Wikipedia, as of its February 6, 2007 snapshot, colored according to text trust. This is the first time that even we can look at how the "trust coloring" looks on the whole of the Wikipedia! We would be very interested in feedback (the wikiquality-l@lists.wikimedia.org mailing list is the best place).
Sadly it doesn't appear to contain the complete article histories. There are a number old manually detected cases of long undetected vandalism I had recorded that I wanted to use to gauge its performance.
That's true. We had to truncate histories to make everything fit into a server. We are gaining experience in how to deal with Wikipedia information (terabytes of it), and we may be able to give a better demo in some time, with full histories, but.... we beed to buy some storage first! :-)
Luca
On Dec 19, 2007 3:05 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Dec 19, 2007 4:36 PM, Luca de Alfaro luca@soe.ucsc.edu wrote:
Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the
whole
English Wikipedia, as of its February 6, 2007 snapshot, colored
according to
text trust. This is the first time that even we can look at how the "trust coloring" looks on the whole of the Wikipedia! We would be very interested in feedback (the wikiquality-l@lists.wikimedia.org mailing list is the best place).
Sadly it doesn't appear to contain the complete article histories. There are a number old manually detected cases of long undetected vandalism I had recorded that I wanted to use to gauge its performance.
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
On Dec 20, 2007 9:10 AM, Luca de Alfaro luca@soe.ucsc.edu wrote:
That's true. We had to truncate histories to make everything fit into a server. We are gaining experience in how to deal with Wikipedia information (terabytes of it), and we may be able to give a better demo in some time, with full histories, but.... we need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine the full histories of those? 40,000 articles is more than enough for a dem, and we can rig the sample to include some articles of interest if needed.
Akash
Oh yes! In fact, if you tell me which article titles you are interested in, I can run those through, and load them in a secondary demo we have. I may get around to posting the results only in early January though, as the break is fast approaching. Luca
On Dec 19, 2007 4:00 PM, draicone@gmail.com draicone@gmail.com wrote:
On Dec 20, 2007 9:10 AM, Luca de Alfaro luca@soe.ucsc.edu wrote:
That's true. We had to truncate histories to make everything fit into a
server.
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
and we may be able to give a better demo in some time, with full
histories, but.... we
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine the full histories of those? 40,000 articles is more than enough for a dem, and we can rig the sample to include some articles of interest if needed.
Akash
I can provide a list of the top 40,000 articles rated by quality according to the wikipedia editorial team. A random sample is unlikely to be interesting, as greater than 70% of articles are stubs.
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Index
On Dec 19, 2007 5:05 PM, Luca de Alfaro luca@soe.ucsc.edu wrote:
Oh yes! In fact, if you tell me which article titles you are interested in, I can run those through, and load them in a secondary demo we have. I may get around to posting the results only in early January though, as the break is fast approaching. Luca
On Dec 19, 2007 4:00 PM, draicone@gmail.com draicone@gmail.com wrote:
On Dec 20, 2007 9:10 AM, Luca de Alfaro luca@soe.ucsc.edu wrote:
That's true. We had to truncate histories to make everything fit into
a server.
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
and we may be able to give a better demo in some time, with full
histories, but.... we
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine the full histories of those? 40,000 articles is more than enough for a dem, and we can rig the sample to include some articles of interest if needed.
Akash
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Great, but I also welcome suggestions of articles where you know interesting things have happened (I can then include both).
Luca
PS: When doing a random sample, I can select articles with > 200 revisions, and that gets rid of the stub problem.
On Dec 19, 2007 4:07 PM, Brian Brian.Mingus@colorado.edu wrote:
I can provide a list of the top 40,000 articles rated by quality according to the wikipedia editorial team. A random sample is unlikely to be interesting, as greater than 70% of articles are stubs.
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Index
On Dec 19, 2007 5:05 PM, Luca de Alfaro luca@soe.ucsc.edu wrote:
Oh yes! In fact, if you tell me which article titles you are interested in, I can run those through, and load them in a secondary demo we have. I may get around to posting the results only in early January though, as the break is fast approaching. Luca
On Dec 19, 2007 4:00 PM, draicone@gmail.com < draicone@gmail.com> wrote:
On Dec 20, 2007 9:10 AM, Luca de Alfaro luca@soe.ucsc.edu wrote:
That's true. We had to truncate histories to make everything fit
into a server.
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
and we may be able to give a better demo in some time, with full
histories, but.... we
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine the full histories of those? 40,000 articles is more than enough for a dem, and we can rig the sample to include some articles of interest if
needed.
Akash
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
On Dec 20, 2007 10:07 AM, Brian Brian.Mingus@colorado.edu wrote:
I can provide a list of the top 40,000 articles rated by quality according to the wikipedia editorial team. A random sample is unlikely to be interesting, as greater than 70% of articles are stubs.
Well, we don't really want the top articles, a broad range to see how the system behaves with different levels of quality is important, but we could certainly take the top 10,000 and put in another 10,000 at random. At the end of the day, this is just a demo, and even 100 articles will do -- nobody on this list is going to read through the whole 40K. If anyone
@Gregory: Could you post some of those cases to the list so that they can be imported manually whenever Luca is available?
Well, here are the ids<TAB>titles for the top 2000 articles. I'll let you deal with the random sample :)
On Dec 19, 2007 5:11 PM, draicone@gmail.com draicone@gmail.com wrote:
On Dec 20, 2007 10:07 AM, Brian Brian.Mingus@colorado.edu wrote:
I can provide a list of the top 40,000 articles rated by quality
according
to the wikipedia editorial team. A random sample is unlikely to be interesting, as greater than 70% of articles are stubs.
Well, we don't really want the top articles, a broad range to see how the system behaves with different levels of quality is important, but we could certainly take the top 10,000 and put in another 10,000 at random. At the end of the day, this is just a demo, and even 100 articles will do -- nobody on this list is going to read through the whole 40K. If anyone
@Gregory: Could you post some of those cases to the list so that they can be imported manually whenever Luca is available?
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Am Mittwoch, 19. Dezember 2007 22:36:37 schrieb Luca de Alfaro:
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole English Wikipedia, as of its February 6, 2007 snapshot, colored according to text trust.
I looked at the demo at http://wiki-trust.cse.ucsc.edu:80/index.php/Moon. Most remarkably in this example is a whole section with a private original research theory on "Binary planet systems". So sadly (or luckily? ;-) the latest version in your snapshot contains a bad edit; compare it also to the relevant edit in Wikipedia: http://en.wikipedia.org/w/index.php?title=Moon&diff=prev&oldid=10709...
So your algorithm highlighted the wrong content. The problematic part is a bad summary of an older origin of moon theory which is beeing described here overly simplified with some ad-hoc-erisms and thus is made even more wrong by the author of these lines (OT: I probably know the author of these lines from de.wikipedia: he tried to post this and other private theories in several astronomy articles).
Ok. How did Wikipedia work out in that case? It took a little more than an hour to revert this. So Wikipedia was able to resolve this problem with the current tools rather quickly. :-)
This doesn't mean we don't need your stuff. Quite the contrary. I come to some very promising and interesting (and maybe non-obvious) use cases:
1) The (German) Wikipedia DVD. The basis of the Wikipedia DVD is a database dump. The first Wikipedia CD and DVD contained an "as is" snapshot transformed to the Digibib reader format of Directmedia Publishing GmbH (http://en.wikipedia.org/wiki/Directmedia_Publishing). However these snapshots had the above problem with short lived nonsense content that happened to be in the snapshot. For the DVD's up to now different trust metrices were used in order to find the "latest acceptable article version" out of a given snapshot. One metric was the "latest version of a trusted user". The current DVD from November 2007 uses a "user karma system" in order to find the latest acceptable version (see http://en.wikipedia.org/wiki/Directmedia_Publishing if you can read German, however the karma system doesn't get described there). So I think that "offline Wikipedias" such as the Wikipedia DVD and Wikipedia read only mirrors would benefit a lot from your effort in order to know which most recent version of a given article is the one they should provide to their readers.
2) A combination with the reviewed article version. Several people pointed out that they fear the reviewed article version need a lot of checks depending on configuration mode if latest flagged or current version is shown by default. Furthermore there are different opinions which one of both modes is the best. How about this third "middle ground" mode: If the karma of a given article (according to your algorithm) version falls below a certain karma threshold, the latest version above this theshold is shown by default to anon readers if there is no newer version flagged as reviewed. That way anon people usually see the most recent article version and we always can overrule the alorithm which is a good thing (TM) as you never should blindly trust algorithms (you know otherwise people will try to trick the algorithm, see Google PageRank).
The drop below a certain karma threshold could be highlighted via a simple automatically added "veto" flag, which can be undone by people that can set quality flags.
That way we would have three flags (in my favourite system): "veto", "sighted" and "reviewed". The veto flags makes only little sense for manual application cause a human can and should (!) do a revert but it would be very useful for automatic things (automatic reverts are evil).
Cheers, Arnomane
Short correction of myself...
So your algorithm highlighted the wrong content.
I naturally meant: "So your algorithm correctly highlighted dubious content."
The current DVD from November 2007 uses a "user karma system" in order to find the latest acceptable version (see http://en.wikipedia.org/wiki/Directmedia_Publishing if you can read German, however the karma system doesn't get described there).
Wrong copy & paste of the same URL. The right URL is: http://blog.zeno.org/?p=87
Arnomane
Dear Daniel,
I believe you are saying that:
1. The trust coloring rightly colored orange (low-trust) some unreliable content, 2. and the Wikipedia people were quick in reverting it.
Right?
About 1, I am delighted our methods worked in this case. Note that we also highlight as low trust text that is by anonymous contributors. The text will then gain trust as it is revised. Also, we color the whole article history, so if you want to see how things evolve, you can look at that.
About 2, I am very glad that bad edits are quickly reverted; this is the whole reason Wikipedia has worked up to now. Still, it might be easier for editors to find content to check via the coloring, rather than by staring at diffs. Other uses, as you point out, are:
- Burning the content on DVDs / Flash memory (wikisticks?) - Making feeds of high-quality revisions for elementary schools, etc - Generally giving readers (who unlike editors do not do diffs) that warm fuzzy feeling that "the text has been around awhile" (can this help answer those critics who mumble that the wikipedia is "unreliable"?) - Finding when flagged revisions are out of date (there may be a new high-trust version later)
BTW, as the method is language-independent, we look forward to doing the same for wikipedias in other languages.
Luca
On Dec 19, 2007 3:32 PM, Daniel Arnold arnomane@gmx.de wrote:
Am Mittwoch, 19. Dezember 2007 22:36:37 schrieb Luca de Alfaro:
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the
whole
English Wikipedia, as of its February 6, 2007 snapshot, colored
according
to text trust.
I looked at the demo at http://wiki-trust.cse.ucsc.edu:80/index.php/Moon. Most remarkably in this example is a whole section with a private original research theory on "Binary planet systems". So sadly (or luckily? ;-) the latest version in your snapshot contains a bad edit; compare it also to the relevant edit in Wikipedia: http://en.wikipedia.org/w/index.php?title=Moon&diff=prev&oldid=10709...
So your algorithm highlighted the wrong content. The problematic part is a bad summary of an older origin of moon theory which is beeing described here overly simplified with some ad-hoc-erisms and thus is made even more wrong by the author of these lines (OT: I probably know the author of these lines from de.wikipedia: he tried to post this and other private theories in several astronomy articles).
Ok. How did Wikipedia work out in that case? It took a little more than an hour to revert this. So Wikipedia was able to resolve this problem with the current tools rather quickly. :-)
This doesn't mean we don't need your stuff. Quite the contrary. I come to some very promising and interesting (and maybe non-obvious) use cases:
- The (German) Wikipedia DVD.
The basis of the Wikipedia DVD is a database dump. The first Wikipedia CD and DVD contained an "as is" snapshot transformed to the Digibib reader format of Directmedia Publishing GmbH (http://en.wikipedia.org/wiki/Directmedia_Publishing). However these snapshots had the above problem with short lived nonsense content that happened to be in the snapshot. For the DVD's up to now different trust metrices were used in order to find the "latest acceptable article version" out of a given snapshot. One metric was the "latest version of a trusted user". The current DVD from November 2007 uses a "user karma system" in order to find the latest acceptable version (see http://en.wikipedia.org/wiki/Directmedia_Publishing if you can read German, however the karma system doesn't get described there). So I think that "offline Wikipedias" such as the Wikipedia DVD and Wikipedia read only mirrors would benefit a lot from your effort in order to know which most recent version of a given article is the one they should provide to their readers.
- A combination with the reviewed article version.
Several people pointed out that they fear the reviewed article version need a lot of checks depending on configuration mode if latest flagged or current version is shown by default. Furthermore there are different opinions which one of both modes is the best. How about this third "middle ground" mode: If the karma of a given article (according to your algorithm) version falls below a certain karma threshold, the latest version above this theshold is shown by default to anon readers if there is no newer version flagged as reviewed. That way anon people usually see the most recent article version and we always can overrule the alorithm which is a good thing (TM) as you never should blindly trust algorithms (you know otherwise people will try to trick the algorithm, see Google PageRank).
The drop below a certain karma threshold could be highlighted via a simple automatically added "veto" flag, which can be undone by people that can set quality flags.
That way we would have three flags (in my favourite system): "veto", "sighted" and "reviewed". The veto flags makes only little sense for manual application cause a human can and should (!) do a revert but it would be very useful for automatic things (automatic reverts are evil).
Cheers, Arnomane
Hello Luca,
- The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
- and the Wikipedia people were quick in reverting it.
Yes.
Note that we also highlight as low trust text that is by anonymous contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A with one account and person B with two accounts. Both have a medium reputation value for their accounts. User A edits an article with his account 4 times. All 4 subsequent edits are taken together and the article has a maximum trust value according to the user's reputation. User B makes as well 4 edits to an article but switches between his accounts and thus "reviews" his own edits. If I understand your algorithm correctly the sock puppeted article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in karma systems without even knowing which accounts are sock puppets: http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-). The system described there differs from your approach but the idea on how to avoid incentives for sock puppets without even knowing who a sock puppet is could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a limited amount of time for editing (I don't consider bots cause they are easily detectable by humans). A single person needs the same time for e.g. 4 edits (in the following I assume each edit has the same length in bytes) regardless how much accounts are used but two different people with each 2 edits only need half of the (imaginary) time (you don't need to measure any time untits at all).
So the maximum possible reliability person B can apply to the article with its two accounts (let us say each acount has 2 edits = 4 total edits) has to be the same as the one which is possible with person A's single account (4 edits). So in general two accounts with each X edits should never be able to add more trust to an article than one person with 2*X edits (note: edit count number is only for illustration, you can take another appropriate contribution unit).
About 2, I am very glad that bad edits are quickly reverted; this is the whole reason Wikipedia has worked up to now. Still, it might be easier for editors to find content to check via the coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that were forgotten and are still the latest version).
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in a way described by my previous mail). An automated system probably always has some weaknesses some clever people can abuse but it is very fast, while a hand crafted system depends on the speed of individual persons but is much harder to fool.
BTW, as the method is language-independent, we look forward to doing the same for wikipedias in other languages.
Good to know. :-)
Arnomane
Sockpuppets? Surely this can't be more than .00000000000001% of the user base?
On Dec 19, 2007 7:05 PM, Daniel Arnold arnomane@gmx.de wrote:
Hello Luca,
- The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
- and the Wikipedia people were quick in reverting it.
Yes.
Note that we also highlight as low trust text that is by anonymous contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A with one account and person B with two accounts. Both have a medium reputation value for their accounts. User A edits an article with his account 4 times. All 4 subsequent edits are taken together and the article has a maximum trust value according to the user's reputation. User B makes as well 4 edits to an article but switches between his accounts and thus "reviews" his own edits. If I understand your algorithm correctly the sock puppeted article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in karma systems without even knowing which accounts are sock puppets: http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-). The system described there differs from your approach but the idea on how to avoid incentives for sock puppets without even knowing who a sock puppet is could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a limited amount of time for editing (I don't consider bots cause they are easily detectable by humans). A single person needs the same time for e.g. 4 edits (in the following I assume each edit has the same length in bytes) regardless how much accounts are used but two different people with each 2 edits only need half of the (imaginary) time (you don't need to measure any time untits at all).
So the maximum possible reliability person B can apply to the article with its two accounts (let us say each acount has 2 edits = 4 total edits) has to be the same as the one which is possible with person A's single account (4 edits). So in general two accounts with each X edits should never be able to add more trust to an article than one person with 2*X edits (note: edit count number is only for illustration, you can take another appropriate contribution unit).
About 2, I am very glad that bad edits are quickly reverted; this is the whole reason Wikipedia has worked up to now. Still, it might be easier for editors to find content to check via the coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that were forgotten and are still the latest version).
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in a way described by my previous mail). An automated system probably always has some weaknesses some clever people can abuse but it is very fast, while a hand crafted system depends on the speed of individual persons but is much harder to fool.
BTW, as the method is language-independent, we look forward to doing the same for wikipedias in other languages.
Good to know. :-)
Arnomane
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
On Dec 20, 2007 1:07 PM, Brian Brian.Mingus@colorado.edu wrote:
Sockpuppets? Surely this can't be more than .00000000000001% of the user base?
Are you suggesting we have 0.000000006 sockpuppets on the English Wikipedia? ;)
That's a bit optimistic. I'd aim for more like 0.001%, which gives us e.g. just under 100 to deal with on enwiki. The problem with socks is that their impact on quality is potentially many times that of a typical user, and hence deserve our attention.
Daniel is making some very good points.
Our current algorithm is vulnerable to two kinds of attacks:
- Sock puppets - People who split an edit into many smaller ones, done with sock puppets or not, in order to raise the trust of text.
We think we know how to fix or at least mitigate both problems. This is why I say that a "real-time" system that colors revisions as they are made is a couple of months (I hope) away. The challenge is not so much to reorganize the code to work from wikipedia dumps to real-time edits. The challenge for us is to analyze, implement, and quantify the performance of versions of the algorithms that are resistant to attack. For those of you who have checked our papers, you would have seen that not only we propose algorithms, but we do extensive performance studies on how good the algorithms are. We will want to do the same for the algorithms for fighting sock puppets.
About the proposal by Daniel: time alone does not cover our full set of concerns. I can every day use identity A to erase some good text, and identity B to put it back in. Then, the reputation of B would grow a bit every day, even though B did not do much effort. We are thinking of some other solutions... but please forgive us for keeping this to ourselves a little bit longer... we would like to have a chance to do a full study before shooting our mouths off...
Luca
On Dec 19, 2007 6:05 PM, Daniel Arnold arnomane@gmx.de wrote:
Hello Luca,
- The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
- and the Wikipedia people were quick in reverting it.
Yes.
Note that we also highlight as low trust text that is by anonymous contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A with one account and person B with two accounts. Both have a medium reputation value for their accounts. User A edits an article with his account 4 times. All 4 subsequent edits are taken together and the article has a maximum trust value according to the user's reputation. User B makes as well 4 edits to an article but switches between his accounts and thus "reviews" his own edits. If I understand your algorithm correctly the sock puppeted article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in karma systems without even knowing which accounts are sock puppets: http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-). The system described there differs from your approach but the idea on how to avoid incentives for sock puppets without even knowing who a sock puppet is could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a limited amount of time for editing (I don't consider bots cause they are easily detectable by humans). A single person needs the same time for e.g. 4 edits (in the following I assume each edit has the same length in bytes) regardless how much accounts are used but two different people with each 2 edits only need half of the (imaginary) time (you don't need to measure any time untits at all).
So the maximum possible reliability person B can apply to the article with its two accounts (let us say each acount has 2 edits = 4 total edits) has to be the same as the one which is possible with person A's single account (4 edits). So in general two accounts with each X edits should never be able to add more trust to an article than one person with 2*X edits (note: edit count number is only for illustration, you can take another appropriate contribution unit).
About 2, I am very glad that bad edits are quickly reverted; this is the whole reason Wikipedia has worked up to now. Still, it might be easier for editors to find content to check via the coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that were forgotten and are still the latest version).
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in a way described by my previous mail). An automated system probably always has some weaknesses some clever people can abuse but it is very fast, while a hand crafted system depends on the speed of individual persons but is much harder to fool.
BTW, as the method is language-independent, we look forward to doing the same for wikipedias in other languages.
Good to know. :-)
Arnomane
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Having looked at several pages there, socks don't seem much of a problem. Either it would take several very old "high-trusted" socks to edit or many many new socks to get an article to slight orange/white levels. It takes a good 5-7 edits of mainly average users to get out of deep orange into near white.
What does bother me just a little is that all kinds minor grammar/spelling fixing, tagging, categorizing, ect...edits seem to be made for articles. Actually, when I wrote a JS regexp based history stats tool, I noticed this a while ago. Some random IP/new users adds a chuck of text or starts and article, then some users and bots make from 5-10 really minor edits (style/category/tagging stuff), and then occasionally some actual content edits are made. The Article Trust code keeps bumping up the trust for pages. Looking at sample pages there, it doesn't appear to be enough to get vandalism to near-white, which is good. My only worry is the trollish user that adds POV-vandalism and subtle vandalism (switching dates) will have it's trust get too because bunch of users make maintenance edits afterwards. Much of our main user base just does maintenance, so they tend to have high "reputation" trust and make many such edits.
I think what would help, behind excluding bots (which is really a no-brainer in my opinion) is to add some heuristics that devalue the trust increase if someone is just make very small edits to the upper or lower extremities of a page. This would catch a lot of tag/category/interwiki link maintenance and stop it from bumping the trust to much in the page's middle.
Still it is pretty good at picking up on the garbage, so it looks promising. I'd be interesting in knowing what would happen if the newest revision with all 7+ trust was marked as a stable version for each page. Would that be good enough to pretty much be vandal-free?
-Aaron Schulz
Date: Wed, 19 Dec 2007 19:44:14 -0800 From: luca@dealfaro.org To: wikiquality-l@lists.wikimedia.org Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
Daniel is making some very good points.
Our current algorithm is vulnerable to two kinds of attacks: Sock puppetsPeople who split an edit into many smaller ones, done with sock puppets or not, in order to raise the trust of text.
We think we know how to fix or at least mitigate both problems. This is why I say that a "real-time" system that colors revisions as they are made is a couple of months (I hope) away. The challenge is not so much to reorganize the code to work from wikipedia dumps to real-time edits. The challenge for us is to analyze, implement, and quantify the performance of versions of the algorithms that are resistant to attack. For those of you who have checked our papers, you would have seen that not only we propose algorithms, but we do extensive performance studies on how good the algorithms are. We will want to do the same for the algorithms for fighting sock puppets.
About the proposal by Daniel: time alone does not cover our full set of concerns. I can every day use identity A to erase some good text, and identity B to put it back in. Then, the reputation of B would grow a bit every day, even though B did not do much effort.
We are thinking of some other solutions... but please forgive us for keeping this to ourselves a little bit longer... we would like to have a chance to do a full study before shooting our mouths off...
Luca
On Dec 19, 2007 6:05 PM, Daniel Arnold arnomane@gmx.de wrote:
Hello Luca,
- The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
- and the Wikipedia people were quick in reverting it.
Yes.
Note that we also highlight as low trust text that is by anonymous contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your
algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A with one account and person B with two accounts. Both have a medium reputation value for their accounts. User A edits an article with his account
4 times. All 4 subsequent edits are taken together and the article has a maximum trust value according to the user's reputation. User B makes as well 4 edits to an article but switches between his accounts and thus "reviews"
his own edits. If I understand your algorithm correctly the sock puppeted article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in karma systems without even knowing which accounts are sock puppets:
http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-). The system described there differs from your approach but the idea on how to
avoid incentives for sock puppets without even knowing who a sock puppet is could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a limited amount of time for editing (I don't consider bots cause they are
easily detectable by humans). A single person needs the same time for e.g. 4 edits (in the following I assume each edit has the same length in bytes) regardless how much accounts are used but two different people with each 2
edits only need half of the (imaginary) time (you don't need to measure any time untits at all).
So the maximum possible reliability person B can apply to the article with its two accounts (let us say each acount has 2 edits = 4 total edits) has to be
the same as the one which is possible with person A's single account (4 edits). So in general two accounts with each X edits should never be able to add more trust to an article than one person with 2*X edits (note: edit count
number is only for illustration, you can take another appropriate contribution unit).
About 2, I am very glad that bad edits are quickly reverted; this is the whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check via the coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that
were forgotten and are still the latest version).
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in
a way described by my previous mail). An automated system probably always has some weaknesses some clever people can abuse but it is very fast, while a hand crafted system depends on the speed of individual persons but is much
harder to fool.
BTW, as the method is language-independent, we look forward to doing the same for wikipedias in other languages.
Good to know. :-)
Arnomane
_______________________________________________ Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________ The best games are on Xbox 360. Click here for a special offer on an Xbox 360 Console. http://www.xbox.com/en-US/hardware/wheretobuy/
A very efficient stop on bumping the trust is to only bump the trust metrics when actual content is added. When I experimented with similar code I found that Shannon entropy could be used as a measure, but then, the entropy of weird looking iw-links would bump the metrics. To adjust for that I had to introduce some logic, and then I got some asymmetry and was once more susceptible for sock puppetry. This should be avoidable.
One possible solution is to only use Shannon entropy for words that exists in a vocabulary, and every other word are given a flat rating. This seems to work well.
John E
Aaron Schulz skrev:
Having looked at several pages there, socks don't seem much of a problem. Either it would take several very old "high-trusted" socks to edit or many many new socks to get an article to slight orange/white levels. It takes a good 5-7 edits of mainly average users to get out of deep orange into near white.
What does bother me just a little is that all kinds minor grammar/spelling fixing, tagging, categorizing, ect...edits seem to be made for articles. Actually, when I wrote a JS regexp based history stats tool, I noticed this a while ago. Some random IP/new users adds a chuck of text or starts and article, then some users and bots make from 5-10 really minor edits (style/category/tagging stuff), and then occasionally some actual content edits are made. The Article Trust code keeps bumping up the trust for pages. Looking at sample pages there, it doesn't appear to be enough to get vandalism to near-white, which is good. My only worry is the trollish user that adds POV-vandalism and subtle vandalism (switching dates) will have it's trust get too because bunch of users make maintenance edits afterwards. Much of our main user base just does maintenance, so they tend to have high "reputation" trust and make many such edits.
I think what would help, behind excluding bots (which is really a no-brainer in my opinion) is to add some heuristics that devalue the trust increase if someone is just make very small edits to the upper or lower extremities of a page. This would catch a lot of tag/category/interwiki link maintenance and stop it from bumping the trust to much in the page's middle.
Still it is pretty good at picking up on the garbage, so it looks promising. I'd be interesting in knowing what would happen if the newest revision with all 7+ trust was marked as a stable version for each page. Would that be good enough to pretty much be vandal-free?
-Aaron Schulz
------------------------------------------------------------------------ Date: Wed, 19 Dec 2007 19:44:14 -0800 From: luca@dealfaro.org To: wikiquality-l@lists.wikimedia.org Subject: Re: [Wikiquality-l] Wikipedia colored according to trust Daniel is making some very good points. Our current algorithm is vulnerable to two kinds of attacks: * Sock puppets * People who split an edit into many smaller ones, done with sock puppets or not, in order to raise the trust of text. We think we know how to fix or at least mitigate both problems. This is why I say that a "real-time" system that colors revisions as they are made is a couple of months (I hope) away. The challenge is not so much to reorganize the code to work from wikipedia dumps to real-time edits. The challenge for us is to analyze, implement, and quantify the performance of versions of the algorithms that are resistant to attack. For those of you who have checked our papers, you would have seen that not only we propose algorithms, but we do extensive performance studies on how good the algorithms are. We will want to do the same for the algorithms for fighting sock puppets. About the proposal by Daniel: time alone does not cover our full set of concerns. I can every day use identity A to erase some good text, and identity B to put it back in. Then, the reputation of B would grow a bit every day, even though B did not do much effort. We are thinking of some other solutions... but please forgive us for keeping this to ourselves a little bit longer... we would like to have a chance to do a full study before shooting our mouths off... Luca On Dec 19, 2007 6:05 PM, Daniel Arnold <arnomane@gmx.de <mailto:arnomane@gmx.de>> wrote: Hello Luca, > 1. The trust coloring rightly colored orange (low-trust) some > unreliable content, Yes I was lost in translation. ;-) > 2. and the Wikipedia people were quick in reverting it. Yes. > Note that we also highlight as low trust text that is by anonymous > contributors. The text will then gain trust as it is revised. One possible weakness came into my mind after I also read your paper. Your algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A with one account and person B with two accounts. Both have a medium reputation value for their accounts. User A edits an article with his account 4 times. All 4 subsequent edits are taken together and the article has a maximum trust value according to the user's reputation. User B makes as well 4 edits to an article but switches between his accounts and thus "reviews" his own edits. If I understand your algorithm correctly the sock puppeted article is trusted more than the other one. Quite some time ago I reflected how to avoid incentives for sock puppets in karma systems without even knowing which accounts are sock puppets: http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-). The system described there differs from your approach but the idea on how to avoid incentives for sock puppets without even knowing who a sock puppet is could perhapes adapted to your system. The basic idea for a sock puppet proof metric is is that a person has only a limited amount of time for editing (I don't consider bots cause they are easily detectable by humans). A single person needs the same time for e.g. 4 edits (in the following I assume each edit has the same length in bytes) regardless how much accounts are used but two different people with each 2 edits only need half of the (imaginary) time (you don't need to measure any time untits at all). So the maximum possible reliability person B can apply to the article with its two accounts (let us say each acount has 2 edits = 4 total edits) has to be the same as the one which is possible with person A's single account (4 edits). So in general two accounts with each X edits should never be able to add more trust to an article than one person with 2*X edits (note: edit count number is only for illustration, you can take another appropriate contribution unit). > About 2, I am very glad that bad edits are quickly reverted; this is the > whole reason Wikipedia has worked up to now. > Still, it might be easier for editors to find content to check via the > coloring, rather than by staring at diffs. That's certainly true for articles not on your watchlist (or bad edits that were forgotten and are still the latest version). > - Finding when flagged revisions are out of date (there may be a new > high-trust version later) Well as I said I'd love to see flagged revisions and your system combined (in a way described by my previous mail). An automated system probably always has some weaknesses some clever people can abuse but it is very fast, while a hand crafted system depends on the speed of individual persons but is much harder to fool. > BTW, as the method is language-independent, we look forward to doing the > same for wikipedias in other languages. Good to know. :-) Arnomane _______________________________________________ Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org <mailto:Wikiquality-l@lists.wikimedia.org> http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
The best games are on Xbox 360. Click here for a special offer on an Xbox 360 Console. Get it now!
http://www.xbox.com/en-US/hardware/wheretobuy/
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Am Donnerstag, 20. Dezember 2007 04:44:14 schrieb Luca de Alfaro:
For those of you who have checked our papers, you would have seen that not only we propose algorithms, but we do extensive performance studies on how good the algorithms are. We will want to do the same for the algorithms for fighting sock puppets.
I liked it very much that you spent a lot of thoughts on the robustness of your algrorithm (I think this is one of its key advantages over so many other naive karma systems out there) and that in this stage it is already restistant to many attacks. So I am confident that sock puppet and minor edit attack (Aaron's maintenance edit analysis is part of it) can be solved as well by you. :-)
About the proposal by Daniel: time alone does not cover our full set of concerns. I can every day use identity A to erase some good text, and identity B to put it back in. Then, the reputation of B would grow a bit every day, even though B did not do much effort.
That's true. The reason is that it takes less effort (= personal work time) to remove X bytes than to add them. Perhapes some weight factor on different kinds of edits can avoid this.
Arnomane
This kind of problem will arise from all systems where an asymmetry is introduced. One part can then fight the other part and win out, due tor the ordering of the fight. The problem is rather difficult to solve as the system must be symmetrical not only between two consecutive edits but edits split out over several postings, merged, intermixed with other edits etc.
John E
Luca de Alfaro skrev:
Daniel is making some very good points.
Our current algorithm is vulnerable to two kinds of attacks:
* Sock puppets * People who split an edit into many smaller ones, done with sock puppets or not, in order to raise the trust of text.
We think we know how to fix or at least mitigate both problems. This is why I say that a "real-time" system that colors revisions as they are made is a couple of months (I hope) away. The challenge is not so much to reorganize the code to work from wikipedia dumps to real-time edits. The challenge for us is to analyze, implement, and quantify the performance of versions of the algorithms that are resistant to attack. For those of you who have checked our papers, you would have seen that not only we propose algorithms, but we do extensive performance studies on how good the algorithms are. We will want to do the same for the algorithms for fighting sock puppets.
About the proposal by Daniel: time alone does not cover our full set of concerns. I can every day use identity A to erase some good text, and identity B to put it back in. Then, the reputation of B would grow a bit every day, even though B did not do much effort. We are thinking of some other solutions... but please forgive us for keeping this to ourselves a little bit longer... we would like to have a chance to do a full study before shooting our mouths off...
Luca
On Dec 19, 2007 6:05 PM, Daniel Arnold <arnomane@gmx.de mailto:arnomane@gmx.de> wrote:
Hello Luca, > 1. The trust coloring rightly colored orange (low-trust) some > unreliable content, Yes I was lost in translation. ;-) > 2. and the Wikipedia people were quick in reverting it. Yes. > Note that we also highlight as low trust text that is by anonymous > contributors. The text will then gain trust as it is revised. One possible weakness came into my mind after I also read your paper. Your algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A with one account and person B with two accounts. Both have a medium reputation value for their accounts. User A edits an article with his account 4 times. All 4 subsequent edits are taken together and the article has a maximum trust value according to the user's reputation. User B makes as well 4 edits to an article but switches between his accounts and thus "reviews" his own edits. If I understand your algorithm correctly the sock puppeted article is trusted more than the other one. Quite some time ago I reflected how to avoid incentives for sock puppets in karma systems without even knowing which accounts are sock puppets: http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-). The system described there differs from your approach but the idea on how to avoid incentives for sock puppets without even knowing who a sock puppet is could perhapes adapted to your system. The basic idea for a sock puppet proof metric is is that a person has only a limited amount of time for editing (I don't consider bots cause they are easily detectable by humans). A single person needs the same time for e.g. 4 edits (in the following I assume each edit has the same length in bytes) regardless how much accounts are used but two different people with each 2 edits only need half of the (imaginary) time (you don't need to measure any time untits at all). So the maximum possible reliability person B can apply to the article with its two accounts (let us say each acount has 2 edits = 4 total edits) has to be the same as the one which is possible with person A's single account (4 edits). So in general two accounts with each X edits should never be able to add more trust to an article than one person with 2*X edits (note: edit count number is only for illustration, you can take another appropriate contribution unit). > About 2, I am very glad that bad edits are quickly reverted; this is the > whole reason Wikipedia has worked up to now. > Still, it might be easier for editors to find content to check via the > coloring, rather than by staring at diffs. That's certainly true for articles not on your watchlist (or bad edits that were forgotten and are still the latest version). > - Finding when flagged revisions are out of date (there may be a new > high-trust version later) Well as I said I'd love to see flagged revisions and your system combined (in a way described by my previous mail). An automated system probably always has some weaknesses some clever people can abuse but it is very fast, while a hand crafted system depends on the speed of individual persons but is much harder to fool. > BTW, as the method is language-independent, we look forward to doing the > same for wikipedias in other languages. Good to know. :-) Arnomane _______________________________________________ Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org <mailto:Wikiquality-l@lists.wikimedia.org> http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
On 19/12/2007, Luca de Alfaro luca@soe.ucsc.edu wrote:
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole English Wikipedia, as of its February 6, 2007 snapshot, colored according to text trust. This is the first time that even we can look at how the "trust coloring" looks on the whole of the Wikipedia! We would be very interested in feedback (the wikiquality-l@lists.wikimedia.org mailing list is the best place). If you find bugs, you can email us at http://groups.google.com/group/wiki-trust
Is this something suitable to announce to wikien-l as well?
(Is it something that could survive a Slashdotting?)
- d.
On Dec 20, 2007 11:06 PM, David Gerard dgerard@gmail.com wrote:>
Is this something suitable to announce to wikien-l as well?
(Is it something that could survive a Slashdotting?)
Not sure of Luca's timezone, but I would certainly stay away from announcing to wikien-l at the moment -- doubt it can handle a slashdotting, and something like this would be dugg and reddited as well. Before we announce it to wikien-l we can probably arrange for a dedicated server and some extra caching measures.
Thanks for your concerns!... we were slashdotted in August, and our servers went down then (it was a 1 CPU machine). Now the server uses memcached and squid, and it is an 8 CPU machine. Also, the campus is making us go down via power cuts (they are replacing some feeds somewhere), so in a sense, why also not go down due to Slashdotting, which is more honor and more fun? :-)
So perhaps I should announce on wikien-l ?
Luca
On Dec 20, 2007 5:10 AM, draicone@gmail.com draicone@gmail.com wrote:
On Dec 20, 2007 11:06 PM, David Gerard dgerard@gmail.com wrote:>
Is this something suitable to announce to wikien-l as well?
(Is it something that could survive a Slashdotting?)
Not sure of Luca's timezone, but I would certainly stay away from announcing to wikien-l at the moment -- doubt it can handle a slashdotting, and something like this would be dugg and reddited as well. Before we announce it to wikien-l we can probably arrange for a dedicated server and some extra caching measures.
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
On Dec 21, 2007 1:47 AM, Luca de Alfaro luca@dealfaro.org wrote:
Thanks for your concerns!... we were slashdotted in August, and our servers went down then (it was a 1 CPU machine). Now the server uses memcached and squid, and it is an 8 CPU machine. Also, the campus is making us go down via power cuts (they are replacing some feeds somewhere), so in a sense, why also not go down due to Slashdotting, which is more honor and more fun? :-)
So perhaps I should announce on wikien-l ?
Luca
If you've got 8 CPUs and you're running memcached, nothing we procure (besides an entire server farm) could be more fun to observe crashing. Go for it, and keep some traffic logs :)
For added fun, contact some of the top Digg members directly and ask them to go all out in getting it popular on Digg.com. Instant masses of traffic :)
--- Akash
On 20/12/2007, draicone@gmail.com draicone@gmail.com wrote:
On Dec 21, 2007 1:47 AM, Luca de Alfaro luca@dealfaro.org wrote:
some feeds somewhere), so in a sense, why also not go down due to Slashdotting, which is more honor and more fun? :-) So perhaps I should announce on wikien-l ?
If you've got 8 CPUs and you're running memcached, nothing we procure (besides an entire server farm) could be more fun to observe crashing. Go for it, and keep some traffic logs :) For added fun, contact some of the top Digg members directly and ask them to go all out in getting it popular on Digg.com. Instant masses of traffic :)
It's in the Slashdot firehose:
http://slashdot.org/firehose.pl?op=view&id=433426
So shall we vote it up or down? ;-)
- d.
On Dec 21, 2007 2:09 AM, David Gerard dgerard@gmail.com wrote:
It's in the Slashdot firehose <snip>
So shall we vote it up or down? ;-)
Is that a rhetorical question? :)
--- Akash
Am Donnerstag, 20. Dezember 2007 17:20:08 schrieb draicone@gmail.com:
On Dec 21, 2007 2:09 AM, David Gerard dgerard@gmail.com wrote:
It's in the Slashdot firehose <snip>
So shall we vote it up or down? ;-)
Is that a rhetorical question? :)
I personally don't like press coverage on things that are not ready yet and months away from any practical impact to daily Wikipedia life. We have had *too* many news articles on "soon-to-come" stable versions/Single Login/whatever...
Arnomane
On Dec 21, 2007 2:40 AM, Daniel Arnold arnomane@gmx.de wrote:
I personally don't like press coverage on things that are not ready yet and months away from any practical impact to daily Wikipedia life. We have had *too* many news articles on "soon-to-come" stable versions/Single Login/whatever...
If the press wish to misrepresent and misreport on the activities of the WMF projects, there is little we can do about it. This, however, is simply an attempt to get the word out to the community -- not the press -- that these systems are on their way.
In the past we have not had effective demonstrations of our stable versioning / SSO plans; this demonstration does not fall short in this manner. At any rate, news of the trust system should go some way to restoring general public faith in Wikipedia after the thoroughly exaggerated controversy over secret mailing lists, conspiracies and Durova.
Am Donnerstag, 20. Dezember 2007 17:46:25 schrieb draicone@gmail.com:
This, however, is simply an attempt to get the word out to the community -- not the press -- that these systems are on their way.
There are mechanism inside the community to inform each other (like the mentioned wikien-l). I don't want to reach the community via external news but I admit it is a very difficult thing to keep the right balance if you want a certain project to keep its developement pace.
In the past we have not had effective demonstrations of our stable versioning / SSO plans; this demonstration does not fall short in this manner.
By the way have a look at http://test.wikipedia.org/wiki/Meow. Stable versions have at least reached the official test wiki (found via http://de.wikipedia.org/wiki/Wikipedia:Projektneuheiten#20._Dezember).
At any rate, news of the trust system should go some way to restoring general public faith in Wikipedia after the thoroughly exaggerated controversy over secret mailing lists, conspiracies and Durova.
These specific topics are mainly a problem of en.wikipedia, not *.wikipedia. de.wikipedia has other troubles and one of it is the problem that often german language news agencies are saying "they promised $technical-novelties long time ago, but failed to keep that promise up to now". And this is at least partly our fault cause we quite often said curious reporters what we are dreaming about and not so much about novelties that currently happened (for example the Gadget extension from Duesentrieb has quite some impact on editors, not on readers, but it is probably too techie to be of interest for news people ;-).
Arnomane
On 20/12/2007, Daniel Arnold arnomane@gmx.de wrote:
These specific topics are mainly a problem of en.wikipedia, not *.wikipedia. de.wikipedia has other troubles and one of it is the problem that often german language news agencies are saying "they promised $technical-novelties long time ago, but failed to keep that promise up to now". And this is at least partly our fault cause we quite often said curious reporters what we are dreaming about and not so much about novelties that currently happened (for example the Gadget extension from Duesentrieb has quite some impact on editors, not on readers, but it is probably too techie to be of interest for news people ;-).
With the delays, a lot of the problem is the Foundation's two technical employees, Brion and Tim, being waaaay busy with fundraiser and moving (St Petersburg to San Francisco). So when people ask, I've been saying "two technical employees, fundraiser duties, sorry for delay, give us money and it'll be faster ;-p"
- d.
I've got the software running on testwikipedia now. It also got some rapid input from Tim, as well as some fixes.
-Aaron Schulz
Date: Thu, 20 Dec 2007 17:34:06 +0000 From: dgerard@gmail.com To: wikiquality-l@lists.wikimedia.org Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
On 20/12/2007, Daniel Arnold arnomane@gmx.de wrote:
These specific topics are mainly a problem of en.wikipedia, not *.wikipedia. de.wikipedia has other troubles and one of it is the problem that often german language news agencies are saying "they promised $technical-novelties long time ago, but failed to keep that promise up to now". And this is at least partly our fault cause we quite often said curious reporters what we are dreaming about and not so much about novelties that currently happened (for example the Gadget extension from Duesentrieb has quite some impact on editors, not on readers, but it is probably too techie to be of interest for news people ;-).
With the delays, a lot of the problem is the Foundation's two technical employees, Brion and Tim, being waaaay busy with fundraiser and moving (St Petersburg to San Francisco). So when people ask, I've been saying "two technical employees, fundraiser duties, sorry for delay, give us money and it'll be faster ;-p"
- d.
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________ Don't get caught with egg on your face. Play Chicktionary! http://club.live.com/chicktionary.aspx?icid=chick_wlhmtextlink1_dec
I did some checks on a few articles where very highly reputed authors doing checks on articles, and it seems like the system isn't able to detect very rare edits from such editors as it should. In fact it has marked the _correct_ versions as dubious while later versions with slight corrections from admins that are completely unfamiliar with the matter are marked as high quality.
Not much trustworty (oh shit, is that me??) http://wiki-trust.cse.ucsc.edu/index.php?title=Stave_church&oldid=255577... Lots of copyedits http://wiki-trust.cse.ucsc.edu/index.php?title=Stave_church&diff=3664979... A thrustworthy version (!) http://wiki-trust.cse.ucsc.edu/index.php/Stave_church
This is a very central problem to this kind of trust metrics, people are rated according to what they do at some point in time, without taking into account who they relates to and what is they previous history in other contexts. In the history of the Stave Church article, there is an archaeologist anyone able to locate the person? I think it should be possible to identify a person as an expert within a limited field of expertise, but it isn't easy to figure out how this should be done.
There is also the problem of the persons history. If someone edits and increases its own rating, how should that be handled? I think this should have implications on previous edits.
John E
Luca de Alfaro skrev:
Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole English Wikipedia, as of its February 6, 2007 snapshot, colored according to text trust. This is the first time that even we can look at how the "trust coloring" looks on the whole of the Wikipedia! We would be very interested in feedback (the wikiquality-l@lists.wikimedia.org mailto:wikiquality-l@lists.wikimedia.org mailing list is the best place).
If you find bugs, you can email us at http://groups.google.com/group/wiki-trust http://groups.google.com/group/wiki-trust
Happy Holidays!
Luca
PS: yes, we know, some images look off. It is currently fairly difficult for a site outside of the Wikipedia to fetch Wikipedia images correctly.
PPS: there are going to be a few planned power outages on our campus in the next days, so if the demo is off, try again later.
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
On Dec 21, 2007 11:35 PM, John Erling Blad john.erling.blad@jeb.no wrote:
This is a very central problem to this kind of trust metrics, people are rated according to what they do at some point in time, without taking into account who they relates to and what is they previous history in other contexts. In the history of the Stave Church article, there is an archaeologist anyone able to locate the person? I think it should be possible to identify a person as an expert within a limited field of expertise, but it isn't easy to figure out how this should be done.
This is what we started the Citizendium project for. Wikipedia maintains a degree of anonymity that forms the basis for one of the many cultures of the WMF projects. If we start verifying qualifications, we throw this all out.
The middle ground is a peer-based "reputation" system, as found on many forums, where for various actions peers can allocate reputation to a user. This type of ad-hoc, informal peer-review is the only way to achieve verification of authority without taking the Citizendium approach.
--- Akash
I am familiar with the Citizendium project, and if you think that I try to argue for something similar, then you are way off. The interesting thing is that there are people in the history of that article having absolutely no clue on what they are writing about, and there are people knowing what they writes about. Inspecting the trust coloring it is apparent that the persons without any clue at all has much higher trust metrics than the rest, bumping the trust upwards. To me this does not seem to be what we want. We want to have a system that is much better on identifying the actual experts from those just fuzzing around fixing spelling errors. An article with perfect spelling can be in complete error on the main topic, and if the system does not fix that problem, it isn't a fix at all.
John E
draicone@gmail.com skrev:
On Dec 21, 2007 11:35 PM, John Erling Blad <john.erling.blad@jeb.no mailto:john.erling.blad@jeb.no> wrote:
This is a very central problem to this kind of trust metrics, people are rated according to what they do at some point in time, without taking into account who they relates to and what is they previous history in other contexts. In the history of the Stave Church article, there is an archaeologist anyone able to locate the person? I think it should be possible to identify a person as an expert within a limited field of expertise, but it isn't easy to figure out how this should be done.
This is what we started the Citizendium project for. Wikipedia maintains a degree of anonymity that forms the basis for one of the many cultures of the WMF projects. If we start verifying qualifications, we throw this all out.
The middle ground is a peer-based "reputation" system, as found on many forums, where for various actions peers can allocate reputation to a user. This type of ad-hoc, informal peer-review is the only way to achieve verification of authority without taking the Citizendium approach.
Akash
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
wikiquality-l@lists.wikimedia.org