Re: [Wiki-research-l] Wikipedia colored according to trust

20 Dec 2007

      It seems that what Ward and others are getting at is that it would be useful
to have precision and recall measures for Luca's trust metric. Of course,
the metric can't possibly know it when a brand new user contributes
unusually high quality text to the encyclopedia. Nonetheless, it seems that
a tool such as Amazon's Mechanical Turk could allow us to easily measure how
often false positives and false negatives occur using random sampling.
Although your hammer was not designed for their nail, I imagine it would do
quite well.
On Dec 20, 2007 9:04 AM, Luca de Alfaro luca@soe.ucsc.edu wrote:
...
You are evaluating the coloring against a performance criterion that is
not the one we designed it for.
Our coloring gives orange color to new information that has been added by
low-reputation authors.  New information by high-reputation authors is light
orange.  As the information is revised, it gains trust.
Thus, our coloring answers the question, intuitively: has this information
been revised already?  Have reputable authors looked at it?
You are asking the question: how much information colored orange is
questionable?
This is a different question, and we will never be able to do well, for
the simple reason that it is well known that a lot of the correct factual
information on Wikipedia comes from occasional contributors, including
anonymous authors, and those occasional contributors and anonymous will have
low reputation in most conceivable reputation systems.
We do not plan to do any large-scale human study.  For one, we don't have
the resources.  For another, in the very limited tests we did, the notion of
"questionable" was so subjective that our data contained a HUGE amount of
noise.  We asked to rank edits as -1 (bad), 0 (neutral), +1 (good).  The
probability that two of us agreed was somewhere below 60%.  We decided this
was not a good way to go.
The results of our data-driven evaluation on a random sample of 1000
articles with at least 200 revisions each showed that (quoting from our
paper):

Recall of deletions. We consider the recall of low-trust as a

predictor for deletions. We show that text in the lowest 50% of trust values
   constitutes only 3.4% of the text of articles, yet corresponds to
   66% of the  text that is deleted from one revision to the next.

Precision of deletions.  We consider the precision of low-trust as

a predictor for deletions. We show that text that is in the bottom half of
   trust values has a probability of 33% of being deleted in the very next
   revision, in  contrast with the 1.9% probability for general text.
   The deletion probability raises to 62% for text in the bottom 20%  of trust
   values.

Trust of average vs. deleted text. We consider the trust

distribution of all text, compared to the trust distribution to the text
   that is deleted.  We show that 90% of the text overall had trust at least
   76%, while the average trust for deleted text was 33%.

Trust as a predictor of lifespan.  We select words uniformly at

random, and we consider the statistical correlation between the trust of the
   word at the moment of sampling, and the future lifespan of the word.  We
   show that words with the highest trust have an expected future lifespan that
   is 4.5 times longer than words with no trust.  We remark that this
   is a proper test, since the trust at the time of sampling depends only on
   the history of the word prior to sampling.
Luca
On Dec 20, 2007 6:12 AM, Desilets, Alain Alain.Desilets@nrc-cnrc.gc.ca
wrote:
...
Here is my feedback based on looking at a few pages on topics that I
know very well.
Agile Software Development
·        http://wiki-trust.cse.ucsc.edu/index.php/Agile_software_development
·        Not bad. I counted 13 highlighted items, 5 of which I would say
are questionable.
Usability
·        http://wiki-trust.cse.ucsc.edu/index.php/Usability
·        Not as good. 14 highlighted items 3 of which I would say are
questionable.
Open Source Software
·        http://wiki-trust.cse.ucsc.edu/index.php/Open_source_software
·        Not so good either. 23 highlighted items, 3 of which I would
say are questionable.
This is a very small sample, but it's all I have time to do. It will be
interesting to see how other people rate the precision of the highlightings
on a wider set of topics. Based on these three examples, it's not entirely
clear to me that this system would help me identify questionable items in
topics that I am not so familiar with.
Are you planning to do a larger scale evaluation with human judges? An
issue in that kind of study is to avoid favourable or disfavourable bias on
the part of the judges. Also, you have to make sure that your algorithm is
doing better than random guessing (in other words, there may be so many
questionable phrases in a wiki page that random guessing would be bound to
guess right ounce out of every say, 5 times). One way to avoid these issues
would be to produce pages where half of the highlightings are produced by
your system, and the other half are highlighting a randomly selected
contiguous contribution by a single author.
I think this is really interesting work worth doing, btw. I just don't
know how useful it is in its current state.
Cheers,
Alain Désilets

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Wikipedia colored according to trust