Here is my feedback based on looking at a few pages on topics
that I know very well.
Agile Software Development
ˇ
http://wiki-trust.cse.ucsc.edu/index.php/Agile_software_development
ˇ
Not bad. I counted 13 highlighted items, 5 of which I would say
are questionable.
Usability
ˇ
http://wiki-trust.cse.ucsc.edu/index.php/Usability
ˇ
Not as good. 14 highlighted items 3 of which I would say are
questionable.
Open Source Software
ˇ
http://wiki-trust.cse.ucsc.edu/index.php/Open_source_software
ˇ
Not so good either. 23 highlighted items, 3 of which I would say
are questionable.
This is a very small sample, but it’s all I have time to
do. It will be interesting to see how other people rate the precision of the
highlightings on a wider set of topics. Based on these three examples, it’s
not entirely clear to me that this system would help me identify questionable
items in topics that I am not so familiar with.
Are you planning to do a larger scale evaluation with human
judges? An issue in that kind of study is to avoid favourable or disfavourable
bias on the part of the judges. Also, you have to make sure that your algorithm
is doing better than random guessing (in other words, there may be so many questionable
phrases in a wiki page that random guessing would be bound to guess right ounce
out of every say, 5 times). One way to avoid these issues would be to produce
pages where half of the highlightings are produced by your system, and the
other half are highlighting a randomly selected contiguous contribution by a
single author.
I think this is really interesting work worth doing, btw. I just
don’t know how useful it is in its current state.
Cheers,
Alain Désilets