Having followed this thread, I am somewhat confused about what is meant by the term "article quality", even in a single language, yet alone multiple languages.
Sticking just to a single language for the moment ...
Do we mean that the facts presented are correct? That the kings and queens were born and died on the dates stated?
Do we mean spelling and grammar is correct? Do we mean some kind of logical structure? Do we mean some kind of narrative flow that "tells the story" of the topic in a natural and engaging way?
Do we mean the use of citations? Do we mean whether the citation used actually contains information that supports what is said by the text in the article with which it is associated?
Do we mean some kind of "completeness" of an article? That is, it has "all" the information. If so, what do we do if the topic is split across a number of articles {{main|...}}}? Do we assess the group of articles? And what do we mean by "all" anyway?
Do we mean it meets all the WP policies? Notability? Appropriate use of external links? That the Manual of Style has been carefully followed?
Or do we mean whether it has been assessed as a stub/start/.../good article by some review process?
Whenever I find myself in a discussion about "quality" (on any subject, not just Wikipedia), it pretty much always boils down to "fitness for purpose as perceived by the user". This is why surveying of users is often used to measure quality. "How well did we serve you today?" If anyone has been through Singapore Airport recently, you will have encountered the touch screens asking to rate on a 1-5 scale just about everything you could imagine, every toilet block, every immigration queue, etc. And it does have the cleanest toilets and the fastest immigration queues, so maybe there's something to be said for the approach.
I think we need to have some common understanding of what we mean by quality, before we try to compare it across languages. And when we do compare across languages, then we have to observe that the set of users changes and presumably their needs change too.
It is interesting to note that en.WP page views have dropped consistently since Google Knowledge (which generally displays the first para from the en.WP article) was introduced. What this tells us is that a certain percentage of readers of an article simply want the most basic facts, which would be delivered even by a stub article. "Suriname is a country on the northeastern Atlantic coast of South America" certainly met my information needs adequately (I heard it mentioned on the TV news in connection with a hurricane). After finding out where it was in the world, I could have gone on to read about its colonial history, its demographic sexuality, and its biodiversity, but I didn't because I didn't have a need to know at that moment. My point here is that while we would not generally regard a stub as "quality", but a percentage of the readers of a stub are probably completely satisfied.
Of course, doing surveys of articles with real users is somewhat difficult for a research project. But it might be useful to see how user perceptions of quality compare with other metrics (particularly those which can be more easily generated for a research project). Starting with other metrics, without knowing that they are a good proxy for user perception, is probably a waste of time.
Kerry