[Wikipedia-l] Median article size - more stats

Tim Marklew tmarklew at hotmail.com
Sun Aug 25 20:24:35 UTC 2002


Karl Juhnke wrote:
>I got so curious about my own suggested statistic of median article
>size that I have been tracking it by hand.  So far the data is:
>
>Date  Median article size in bytes
>----  ----------------------------
>8/15  1001
>...
>8/22   991
>My hypothesis on the basis of one week of observation is that,
>although our front page article count is rocketing up, this indicates
>many new small articles, and thus a decrease in the expected information
>contained in a random article.

To take Karl's idea further, I've calculated the median article size at the 
end of February, when Wikipedia was moved onto the Phase II software.  
According to my figures, the median article size has fallen from 1035 
characters in February to 997 now (August 25).

This is a fairly small drop - less than 4% over 6 months.  I suspect that it 
mostly reflects the fact that we are importing fewer big articles now from 
public domain sources.  For example, hundreds of large pages from the CIA 
factbook were imported before February, and we haven't had so many big 
additions since in proportion to the (larger) size of the database.

So we can take the reassuring conclusion that looking at the headline count 
of articles is reasonably reliable as a measure of how quickly Wikipedia is 
growing.  Articles aren't getting all that much smaller or bigger over the 
long term.

Tim (Enchanter)

N.B.  The February figures were calculated as the median size of pages added 
by 'Conversion script', the dummy user which was used to transfer all the 
articles onto the Phase II database.


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com




More information about the Wikipedia-l mailing list