[WikiEN-l] What proportion of articles are stubs?

Carl (CBM) cbm.wikipedia at gmail.com
Mon Nov 29 21:37:45 UTC 2010


On Mon, Nov 29, 2010 at 4:22 PM, Carcharoth <carcharothwp at googlemail.com> wrote:
> Is it possible to have a breakdown of the high-end of that? i.e.
> Number of articles from 10,000 bytes upwards in steps of 5,000 bytes?

Sure, I'll put a table below. The number shown under "len" is the
bottom end of the length range.

> Also, have you
> looked at the byte size and word count of some actual articles, to see
> how accurate your "4.5-bytes-per-word" estimate is?

No, that was just a napkin calculation, based on a google search. Take
it with a grain of salt.

- Carl

+--------+----------------+
| len    | count            |
+--------+----------------+
|  10000 |         167362 |
|  15000 |          73821 |
|  20000 |          40156 |
|  25000 |          25163 |
|  30000 |          16405 |
|  35000 |          11474 |
|  40000 |           8383 |
|  45000 |           6169 |
|  50000 |           4754 |
|  55000 |           3672 |
|  60000 |           2895 |
|  65000 |           2223 |
|  70000 |           1759 |
|  75000 |           1508 |
|  80000 |           1235 |
|  85000 |            960 |
|  90000 |            809 |
|  95000 |            669 |
| 100000 |            531 |
| 105000 |            450 |
| 110000 |            345 |
| 115000 |            268 |
| 120000 |            270 |
| 125000 |            211 |
| 130000 |            210 |
| 135000 |            143 |
| 140000 |            141 |
+--------+----------------+

There are 765 articles longer than 140,000 bytes, which seem to almost
all be lists.



More information about the WikiEN-l mailing list