De: "John Vandenberg" jayvdb@gmail.com A: "discussion list for Wikisource, the free library" wikisource-l@lists.wikimedia.org Objet: Re: [Wikisource-l] Changing the Wikisource main page Date: Sun, 14 Sep 2008 06:03:36 +1000
A Chinese "word" has more meaning than a Spanish "word". I dont have the numbers, but the word "word" is not the same in all languages. This makes words a very complex statistic.
-- John
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
I may have found a very simple solution : if we agree that a chinese sign is a word as we understand "word", than we have to found how many sign there are. I made a test, and found that a chinese sign is 3 octets. The very same statistics tells us that the average number of octets of an article on the chinese wikisource is 1957. So, there are 1957/3 = 652.3 words. The statistics counts (on may 31, 2008) 29084 articles for the chinese wikisource, and 652.3*29084 gives 18.9M words for total.
The only question remaining is : why the statistics page presents 29.3M as the number of words for the chinese wikisource ? Is that the number of "groups of letters" ?
Anyway, if we accept the figures, we would have : 1. English : 211M words - 2. French : 125M - 3. Spanish : 41.8M - 4. Russian : 22.2M - 5. Chinese : 18.9M - 6. Polish : 18.2M - 7. Portuguese : 15.5M - 8. Deutsch : 14.4M - 9. Italian : 12.0M - 10. Arabic : 10.6M.