Hi Kai,
You should start with Ethnologue's country data; this website provides the most comprehensive data. But, be aware that the data may not be updated. so compare it with Endangered Language Project data https://endangeredlanguages.com/ and UNESCO's World Atlas of Language https://en.wal.unesco.org/; in the case of my country, Indonesia, the power dynamics around the national language, Indonesian, and Indigenous (local) languages lead to language shifting to Indonesian, or major lingua franca in each region, such as Makassar Malay in the greater South Sulawesi, etc, and it is hard to exactly calculate the current number since the latest official population census is lack of awareness in language diversity as well.
Hope this helps.
Best, Biyanto
On Fri, Jun 7, 2024 at 7:30 AM Kai Zhu kaizhublcu@gmail.com wrote:
Dear all,
I am currently undertaking a research project that explores the choice of language when reading Wikipedia across different countries. One of the tasks of my study involves mapping Wikipedia languages to the countries where these languages are predominantly spoken. I recognize the complexity of this task and understand that a perfect mapping might not be possible. However, I would appreciate any recommendations on the best methodologies, practices, or data sources for accomplishing this.
Additionally, I have a related question: What are good data sources for information regarding the proportion of a country's population that speaks various languages?
Thank you for your help and insights.
Best regards, Kai Zhu Assistant Professor Bocconi University _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org