Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
Per Section.
Text length in each section
Infoboxes in each section.
Filled parameters in each infobox
Images in each section
Good Article, Featured Article?
Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best,
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l