De: "John Vandenberg"
<jayvdb(a)gmail.com>
A: "discussion list for Wikisource, the free library"
<wikisource-l(a)lists.wikimedia.org>
Objet: Re: [Wikisource-l] Changing the Wikisource main page
Date: Sun, 14 Sep 2008 06:03:36 +1000
A Chinese "word" has more meaning than a
Spanish "word". I dont have
the numbers, but the word "word" is not the same in all languages.
This makes words a very complex statistic.
--
John
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
I may have found a very simple solution : if we agree that a chinese sign is a word as we
understand "word", than we have to found how many sign there are. I made a test,
and found that a chinese sign is 3 octets. The very same statistics tells us that the
average number of octets of an article on the chinese wikisource is 1957. So, there are
1957/3 = 652.3 words. The statistics counts (on may 31, 2008) 29084 articles for the
chinese wikisource, and 652.3*29084 gives 18.9M words for total.
The only question remaining is : why the statistics page presents 29.3M as the number of
words for the chinese wikisource ? Is that the number of "groups of letters" ?
Anyway, if we accept the figures, we would have : 1. English : 211M words - 2. French :
125M - 3. Spanish : 41.8M - 4. Russian : 22.2M - 5. Chinese : 18.9M - 6. Polish : 18.2M -
7. Portuguese : 15.5M - 8. Deutsch : 14.4M - 9. Italian : 12.0M - 10. Arabic : 10.6M.