[WikiEN-l] What proportion of articles are stubs?

Carl (CBM) cbm.wikipedia at gmail.com
Mon Nov 29 21:16:38 UTC 2010


On Mon, Nov 29, 2010 at 12:33 PM, Charles Matthews
<charles.r.matthews at ntlworld.com> wrote:
> Stubs and how to handle them seem to be controversial still (or again),
> which is rather surprising given that we have been going nearly a decade
> now. I'd like to ask how many articles still are stubs, by some sensible
> standard?

The following data is from the live toolserver database just now.
This is not a very detailed standard for counting the number of stubs,
but at least it's objective.

There are 3,517,730 non-redirect pages in the main namespace.  Of
these, 3,144,982 are less then 10,000 bytes; 2,596,291  are less than
5,000 bytes;  1,422,480 are less than 2,000 bytes; 547,342 are less
than 1,000 bytes; and 185,932 are less than 500 bytes. There are about
186,000 pages in [[Category:All disambiguation pages]], which are
included in the above numbers.  Redirects are *not* included.

If we estimate 4.5 bytes per word plus another byte for a space, a
1,000 byte article would have 182 words (ignoring templates and
categories), and a 5,000 byte article would have about 910 words.

I think it's safe to say that the majority of our articles are "short"
and a significant minority are "very short".

- Carl



More information about the WikiEN-l mailing list