[WikiEN-l] Garbage in, garbage out (was Re: Go go deletionism!)

Tony Sidaway tonysidaway at gmail.com
Sat Jun 30 06:43:40 UTC 2007


The afd process could not possibly be responsible for that change in
the graph.  Running flat out it could only process about 200  articles
per day.  More usually it runs between 100 and 150 per day.  Even if
every single listed article were deleted it wouldn't dent the growth
rate.  A more realistic estimation of the deletion rate at AfD would
be about 75%, or perhaps 85% if you regard a redirect, merge or
userfication as a deletion for the purpose of counting the number of
distinct articles in mainspace.

To estimate our article creation rate I used special:newpages to count
the number of articles created two weeks ago (16 June).  This gave me
1866.  These  1866 articles are what remains after two weeks during
which the worst are speedy deleted.  So I think it's a pretty reliable
predictor for the growth of article count.  The encyclopedia is
growing by approximately 1900 articles per day.

We've deleted 1726 items from article space in the past 24 hours by
speedy deletion, proposed deletion or articles for deletion.

Notice that the number of articles deleted daily and the growth rate
(after deletions) are very close.  Put it another way, we're deleting
nearly half of all new article starts.

Now going back to Lih's pretty graphs, I see firstly that my figure of
1900 is congruent with his graph for "increase per day". He shows the
figure hovering between 1500 and 2000 per day during the past two
quarters.  This is a pretty reasonable growth rate and very much the
kind of thing I'd expect.  Li is missing the kind of detail on
deletions that I've provided here.

Notice that this growth rate of 1900 per day is still rather higher
than it was (according to Li's graph) when we turned off anonymous
article creations in late 2005.  We haven't turned off the faucet,
though we may have avoided a flood.

One thing I don't really understand about Li's graph is that he chose
to scale it so that the increase per day graph is overlaid exactly on
the "Number of articles" graph. This would only make sense if he were
positing exponential growth (the first differential of an exponential
is also an exponential). But exponential growth is not possible in an
environment with limited resources.  Matching the two graphs like that
misleads the viewer, who looks at the growth graph and thinks
something catastrophic must have happened because it isn't rising
exponentially.

The rest is just a curve-fitting exercise.  Garbage in, garbage out.
The truth is (as always) we're getting more article starts than we
know what to do with so we're junking half of them, those that we've
decided aren't promising.



More information about the WikiEN-l mailing list