Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality?
That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is
all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
-
Per Section.
-
Good Article, Featured Article?
-
Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best,
Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023