Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

15 Dec 2013

Max,

With regards to quality assessment features, I recommend reading through
our paper from WikiSym this year:
http://www-users.cs.umn.edu/~morten/publications/wikisym2013-tellmemore.pdf

The related work section contains quite a lot of the previous research on
predicting article quality, so there should be plenty of useful reading.
 As James points out, content and number of footnote references are a good
start.

There are a lot of dependencies when it comes to predicting article
quality.  If you're trying to predict High quality vs everything else, the
task isn't overly difficult.  Otherwise it could be more challenging, for
instance there are quite a bit of difference between the FAs and GAs on
English Wikipedia, and in your case you'll probably find the A-class
articles mess things up because their length tends to be somewhere between
the other two and they're of high quality.  I'm currently of the opinion
that an A-class article is simply an FAC that hasn't been submitted for FA
review yet.

You might of course run into problems with different citation traditions if
you're working across language editions.  English uses footnotes heavily,
others might instead use bibliography sections and not really cite specific
claims in the article text. (An issue we mention in our article when we
tried to get our model to work on Norwegian (bokmål) and Swedish Wikipedia).

My $.02, if you'd like to discuss this more, feel free to get in touch.

Cheers,
Morten

On 15 December 2013 07:15, Klein,Max &lt;kleinm(a)oclc.org&gt; wrote:

...
   Wiki Research Junkies,

 I am investigating the comparative quality of articles about  Cote
 d'Ivoire and Uganda versus other countries. I wanted to answer the question
 of what makes high-quality articles? Can anyone point me to any existing
 research on heuristics of Article Quality? That is, determining an articles
 quality by the wikitext properties, without human rating? I would also
 consider using data from the Article Feedback Tools, if there were dumps
 available for each Article in English, French, and Swahili Wikipedias.
 This is all the raw data I can seem to find
 http://toolserver.org/~dartar/aft5/dumps/

 The heuristic technique that I currently using is training a naive
 Bayesian filter based on:

    -

    Per Section.
     -

       Text length in each section
       -

       Infoboxes in each section.
        -

          Filled parameters in each infobox
           -

       Images in each section
        -

    Good Article, Featured Article?
    -

    Then Normalize on Page Views per on population / speakers of native
    language

 Can you also think of any other dimensions or heuristics to
 programatically rate?

  Best,
   Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?