If you just need ballpark numbers, the proposed approach might work. If
you want to produce something concretely usable, it's going to be much
In particular, 100k is a ridiculous number and restricting yourself to
Wikipedia means for many languages you'll lose the most important
content people are looking for, e.g. on Wiktionary and Wikisource
(dictionaries, original literature and official documents in that language).
Abdel Samad, Rawia, 21/01/2015 09:47:
·We are currently using the article count by language
Wikimedia’s foundation public link: Source:
. Is this a reliable
source for article count – does it include stubs?
0) You'd better use its source, http://wikistats.wmflabs.org/
1) which is as reliable as Special:Statistics is, i.e. not so much;
2) and uses the official
3) calling "good" and "stub" what is now called "countable"
·Is it possible to get historic data for article count. It would be
great to monitor the evolution of the metric we have defined over time?
has such data.
·What are the biggest drivers you’ve seen for step change in the number
of articles (e.g., number of active admins, machine translation, etc.)
Bot imports, clearly. The number of articles is an extremely poor metric
for measuring "coverage".
·We had to map Wikipedia language codes to ISO 639-3 language codes in
Ethnologue (source we are using for primary language data). The 2
language code for a wikipedia language in the “List of Wikipedias”
sometimes matches but not always the ISO 639-1 code. Is there an easy
way to do the mapping?
We try to document them at
Andrew Gray, 21/01/2015 10:18:
This uses a definition of "article count"
which is a little more
generous, and counts all pages in the main namespace.