Hello everybody,
I've been doing quite some work on article quality in Wikipedia
- many heuristics have been mentioned here already.
In my opinion, a set of universal indicators for quality
that works for all of Wikipedia does not exist.
This is mainly because the perception of quality is so
different across various WikiProjects and subject areas in a
single Wikipedia and even more so across different Wikipedia
language versions.
On a theoretical level, some universals can be identified.
But as soon as concrete heuristics are to be identified, you
will always have a bias towards the articles you used to
identify these heuristics.
This aspect aside, having an
abstract quality score that tells you how good an article is
according to your heuristics doesn't help a lot in most
cases.
I much more like the
approach to identify quality problems, which also gives you
an idea of the quality of an article.
I have done some work on this [1], [2] and there was a
recent dissertation on the same topic [3].
I'm currently writing my dissertation on language
technology methods to assist quality management in
collaborative environments like Wikipedia. There, I start with
a theoretical model, but as soon as the concrete heuristics
come in to play, the model has to be grounded according to the
concrete quality standards that have been established in a
particular sub-community of Wikipedia. I'm still wrapping up
my work, but if anybody wants to talk, I'll be happy to.
Regards,
Oliver
[1] The Impact of Topic Bias on Quality Flaw Prediction in
Wikipedia
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: Proceedings of the 51st Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long
Papers). p. 721-730, August 2013. Sofia, Bulgaria.
[2] FlawFinder: A Modular System for Predicting Quality
Flaws in Wikipedia - Notebook for PAN at CLEF 2012
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: CLEF 2012 Labs and Workshop, Notebook Papers, n. pag.
September 2012. Rome, Italy.
[3] Analyzing and Predicting Quality Flaws in
User-generated Content: The Case of Wikipedia.
Maik Anderka
Dissertation, Bauhaus-Universität Weimar, June 2013
--
-------------------------------------------------------------------
Oliver Ferschke, M.A.
Doctoral Researcher
Ubiquitous Knowledge Processing Lab (UKP-TU DA)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room
S2/02/B111
ferschke@cs.tu-darmstadt.de
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC)
www.werc.tu-darmstadt.de
-------------------------------------------------------------------
Re Laura's comment.
I don't dispute that there are plenty of high
quality articles which have had only one or two
contributors. However my assumption and
experience is that in general the more editors
the better the quality, and I'd love to see that
assumption tested by research. There may be some
maximum above which quality does not rise, and
there are clearly a number of gifted members of
the community whose work is as good as our best
crowdsourced work, especially when the
crowdsourcing element is to address the minor
imperfection that comes from their own blind
spot. It would be well worthwhile to learn if
Women's football is an exception to this, or
indeed if my own confidence in crowd sourcing is
mistaken
I should also add that while I wouldn't filter out
minor edits you might as well filter out reverted
edits and their reversion. Some of our articles
are notorious vandal targets and their quality is
usually unaffected by a hundred vandalisms and
reversions of vandalism per annum.
Beaver before it was semi protected in Autumn
2011 being a case in point. This also feeds
into Kerry's point that many assessments are
outdated. An article that has been a vandalism
target might have been edited a hundred times
since it was assessed, and yet it is likely to
have changed less than one with only half a dozen
edits all of which added content.
Jonathan