"Steve Bennett"
Hmm. 5 letters per word is one thing, but 5 bytes per word? Seems like it will be skewed by wikitext syntax (tables and templates in particular)...
Yes - OTOH this is just shading it down a bit. Within 10% would be good enough, really. And some articles use looong words.
So is it possible to get the distribution of articles (without pages marked as disambiguations), rendered as "plain text", by total number of words, say by something like quartiles?
Charles
----------------------------------------- Email sent from www.virginmedia.com/email Virus-checked using McAfee(R) Software and scanned for spam