[WikiEN-l] Median article size

charles.r.matthews at ntlworld.com charles.r.matthews at ntlworld.com
Sun Feb 10 16:08:41 UTC 2008


"Steve Bennett" 

> Hmm. 5 letters per word is one thing, but 5 bytes per word? Seems like
> it will be skewed by wikitext syntax (tables and templates in
> particular)...

Yes - OTOH this is just shading it down a bit. Within 10% would be good enough, really. And some articles use looong words. 

So is it possible to get the distribution of articles (without pages marked as disambiguations), rendered as "plain text", by total number of words, say by something like quartiles?

Charles

-----------------------------------------
Email sent from www.virginmedia.com/email
Virus-checked using McAfee(R) Software and scanned for spam




More information about the WikiEN-l mailing list