Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
* Per Section.
* Text length in each section
* Infoboxes in each section.
* Filled parameters in each infobox
* Images in each section
* Good Article, Featured Article?
* Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best,
Maximilian Klein Wikipedian in Residence, OCLC +17074787023