Daniel Mayer wrote:
As of Feb 26, 2004 there were 66.7 million words in
the English
Wikipedia (217,000 articles; 2214 characters/article). Source:
http://www.wikipedia.org/wikistats/EN/Sitemap.htm
Encyclop?dia Britannica's 2002 edition (a full
general encyclopedia)
has 55 million words (85,000 articles; 3882 characters/article).
Source:
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
Columbia Encyclopedia, Sixth Edition (a concise
encyclopedia) has a
word count of 6.5 million words (51,000 articles; 765
characters/article). Source:
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
Something seems slightly odd with your numbers, because this gives
an average word length for Britannica and Columbia of 5.999 and 6.002,
but an average word length for Wikipedia of 7.200.
I suspect that the oddity here is that I'm multiplying
characters/article by articles to get total number of characters, but
probably the 217,000 is a count by some other means than "raw"?
Anyhow, no matter, as the discrepancy is minor and can surely be
resolved with a little tweak. (Or, perhaps it's true that we use
bigger words on average?)
[WAIT: I see the trouble. I just read the source article, and I see
that the reason for the discrepancy is that the characters/word figure
was estimated for Britannica and Columbia.]
Here's what I wonder...
The essential question for us is whether we can slim down to
Columbia's size primarily by leaving out articles, rather than by
editing existing articles down to size. If we can, that's huge,
because the process then becomes *primarily* a matter of selection,
and not a matter of huge rewrites.
This keeps the "forkage" down to an absolute minimum, imposes the
least burden on the community, and might make for a very interesting
result.
--Jimbo