On Mon, Nov 29, 2010 at 9:16 PM, Carl (CBM) <cbm.wikipedia(a)gmail.com> wrote:
I think it's safe to say that the majority of our
articles are "short"
and a significant minority are "very short".
Is it possible to have a breakdown of the high-end of that? i.e.
Number of articles from 10,000 bytes upwards in steps of 5,000 bytes?
(I forget what the size of the largest article is). Also, have you
looked at the byte size and word count of some actual articles, to see
how accurate your "4.5-bytes-per-word" estimate is?
Carcharoth