[WikiEN-l] Looking for thoughts on statistics

Ian Woollard ian.woollard at gmail.com
Sun Mar 28 22:16:16 UTC 2010


On 28/03/2010, Carcharoth <carcharothwp at googlemail.com> wrote:
> On Sun, Mar 28, 2010 at 5:02 PM, Ian Woollard <ian.woollard at gmail.com>
>> I think a lot of people get involved to write new articles. It looks
>> like 2007 was 'peak oil' for new articles; after that it was getting
>> harder to find new articles to write; about half of the articles that
>> were realistically likely to be covered, were already covered.
>
> Does it make sense to say this when *thousands* of articles are being
> created every day?

We're currently looking at about a net increase of about 1200 articles
per day. and seems to be falling.

> Where does the idea even come from that "about half
> of the articles that were realistically likely to be covered, were
> already covered"? The question that needs to be asked is whether the
> "New articles per day" statistic is a measure of the articles being
> created, or the articles that are still there as having been created
> on that day, a set period (e.g. a year) after being created? i.e. Is
> the rate of article deletion included or excluded from those figures?

The idea comes from a mixture of looking at the statistics peak and
looking at the articles that still are needed. Nearly all of the
low-hanging fruit is clearly gone now. Most of the mid-hanging fruit
is also now gone. We're getting towards the top of the tree, things
are getting more obscure. This is a *good* thing, not having so many
holes in the Wikipedia!

> My view is that the rate of article creation and the number of
> "missing" articles depends *heavily* on the topic area. Some topic
> areas are very well covered, others are not so well covered. In the
> former areas, you will indeed struggle to find new articles to create,
> but there are some areas (history in particular) where there are
> thousands (probably tens of thousands) of articles still needed.

I'm sure you're correct. So if there's twenty or thirty other similar
areas, then we're looking at a under a million articles left to write.
We're currently at 3.2 million. I think we'll exceed 4 million within
a few years.

> could easily make lists hundreds of items long of things that an
> article could be written on (this is limited mainly by the time I have
> to compile such lists), mostly on historical subjects, but also a fair
> amount of contemporary stuff as well. Seriously. Pick any topic and I
> can guarantee that a list of ten new articles for that topic area
> would be easy to compile.
>
> Just as an example, I was taking part in the Military History World
> War I contest recently, and there were at least 43 new articles
> created (or expanded) for DYK. I'm currently trying to work out how
> many articles were actually created (as opposed to expanded).

That's not very many compared to 3.2 million articles, but I don't
mean to knock it in any way, just trying to put things into
perspective.

> A better approach would be to look at samples of article creation and
> see what articles are being created and that will give you an idea of
> where the gaps are being filled in and hence how big the gaps are.

This IS the point though; we're now looking for the gaps. That's
exactly what I'm saying. The Wikipedia should more or less run out of
gaps in about 3 years (ish- but it's never going to completely run
out, but growth from existing knowledge will be progressively slower
and slower). OTOH the circle of knowledge is still growing, at a
somewhat slower rate.

> Carcharoth

-- 
-Ian Woollard



More information about the WikiEN-l mailing list