h, 08/07/2014 13:49:
This should also help sociolinguists to identify which languages [...] that are more developed than others in the Wikipedia sphere, and seeks explanations for their relative success/failure by contrasting the Wikipedia sphere and offline/online sphere.
Agreed on the importance of this (though I wouldn't restrict to Wikipedia), and not only for researchers but also for editors to self-assess. For many years our main tool has been sorting by "Editors (5+) per million speakers" column in http://stats.wikimedia.org/EN/Sitemap.htm , which however has two main issues: 1) absurdly high number of editors in some editions makes some noise though not tragic (classic example: Volapük; funny but doesn't really do any harm); 2) irrealistic baseline of "speakers in millions" (which is not so closely related to what happens on the wiki) means the rank mostly shows how well those languages are doing on the internet, e.g. classic dominance of Scandinavia and Israel and classic disuse of Tagalog/Filipino (with some surprises like Northern Sami which clearly has some strong supporters out there).
Realistic baselines would let me answer simple questions like whether it.wiki is really doing better than de.wiki (35 vs. 33?!); given the similarity of conditions, if not I may conclude there is a large uncultivated land out there just waiting for some seeds (outreach to people not knowing Wikimedia projects enough), if yes I may conclude we've probably exhausted our natural resources and need to focus on using them more efficiently.
Nemo