Jimbo wrote:
I'm not yet convinced that there's any need for it, but we really need some wordcount statistics to have a grasp of what would be needed.
As of Feb 26, 2004 there were 66.7 million words in the English Wikipedia (217,000 articles; 2214 characters/article). Source: http://www.wikipedia.org/wikistats/EN/Sitemap.htm
Encyclop�dia Britannica's 2002 edition (a full general encyclopedia) has 55 million words (85,000 articles; 3882 characters/article). Source: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
Columbia Encyclopedia, Sixth Edition (a concise encyclopedia) has a word count of 6.5 million words (51,000 articles; 765 characters/article). Source: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
-- Daniel Mayer (aka mav)
__________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools
Daniel Mayer wrote:
As of Feb 26, 2004 there were 66.7 million words in the English Wikipedia (217,000 articles; 2214 characters/article). Source: http://www.wikipedia.org/wikistats/EN/Sitemap.htm
Encyclop?dia Britannica's 2002 edition (a full general encyclopedia) has 55 million words (85,000 articles; 3882 characters/article). Source: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
Columbia Encyclopedia, Sixth Edition (a concise encyclopedia) has a word count of 6.5 million words (51,000 articles; 765 characters/article). Source: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
Something seems slightly odd with your numbers, because this gives an average word length for Britannica and Columbia of 5.999 and 6.002, but an average word length for Wikipedia of 7.200.
I suspect that the oddity here is that I'm multiplying characters/article by articles to get total number of characters, but probably the 217,000 is a count by some other means than "raw"?
Anyhow, no matter, as the discrepancy is minor and can surely be resolved with a little tweak. (Or, perhaps it's true that we use bigger words on average?)
[WAIT: I see the trouble. I just read the source article, and I see that the reason for the discrepancy is that the characters/word figure was estimated for Britannica and Columbia.]
Here's what I wonder...
The essential question for us is whether we can slim down to Columbia's size primarily by leaving out articles, rather than by editing existing articles down to size. If we can, that's huge, because the process then becomes *primarily* a matter of selection, and not a matter of huge rewrites.
This keeps the "forkage" down to an absolute minimum, imposes the least burden on the community, and might make for a very interesting result.
--Jimbo