[Wikipedia-l] Median article size - more stats
tmarklew at hotmail.com
Sun Aug 25 20:24:35 UTC 2002
Karl Juhnke wrote:
>I got so curious about my own suggested statistic of median article
>size that I have been tracking it by hand. So far the data is:
>Date Median article size in bytes
>My hypothesis on the basis of one week of observation is that,
>although our front page article count is rocketing up, this indicates
>many new small articles, and thus a decrease in the expected information
>contained in a random article.
To take Karl's idea further, I've calculated the median article size at the
end of February, when Wikipedia was moved onto the Phase II software.
According to my figures, the median article size has fallen from 1035
characters in February to 997 now (August 25).
This is a fairly small drop - less than 4% over 6 months. I suspect that it
mostly reflects the fact that we are importing fewer big articles now from
public domain sources. For example, hundreds of large pages from the CIA
factbook were imported before February, and we haven't had so many big
additions since in proportion to the (larger) size of the database.
So we can take the reassuring conclusion that looking at the headline count
of articles is reasonably reliable as a measure of how quickly Wikipedia is
growing. Articles aren't getting all that much smaller or bigger over the
N.B. The February figures were calculated as the median size of pages added
by 'Conversion script', the dummy user which was used to transfer all the
articles onto the Phase II database.
Chat with friends online, try MSN Messenger: http://messenger.msn.com
More information about the Wikipedia-l