[Wikipedia-l] Re: The 100,000 articles goal and minimum article size

Daniel Mayer maveric149 at yahoo.com
Sat Sep 14 06:23:18 UTC 2002


On Friday 13 September 2002 07:59 pm, Axel wrote:
> Currently, the introductory sentences on the main page read
>
>  Welcome to Wikipedia, a collaborative project to produce a complete
>  encyclopedia from scratch. We started in January 2001 and already
>  have 43165 articles. We want to make over 100,000 complete articles,
>  so let's get to work!
>
> I find the 100,000 article sentence out of place:
>
> * The goal is to create a complete, high-quality and free
>   encyclopedia. Nobody knows whether that requires 80,000 or 800,000
>   articles. Nor should anybody care.

Yup. Each time I create a new article of any decent size there are anywhere 
from several to a dozen empty links. Therefore all I see for Wikipedia is 
geometric increases in total article count that will never stop at any point 
(in the same way as solving a question of science leads to several new 
questions). 

Somebody has already stated that there are 30 to 40 thousand human genes and 
that each of them deserves it's own article. I agree. I would also add that 
each of the millions of described and cataloged species on Earth also deserve 
their own articles. Same for every city, major highway and important geologic 
feature. Not to mention the many hundreds of thousands of national rulers, 
kingdoms, generals, notable people, major works of art/science/industry ect. 
that have ever existed.     

I forsee Wikipedia becoming an encyclopedia of encyclopedias.

> * The "100,000 article goal" fosters an unhealthy obsession with
>   statistics, and I'm afraid it can lead people to create stubs, just
>   to help bring the project "closer to this goal", which is of course
>   not its goal at all.

In order to squash this tendency all we need is a much more conservative 
definition of just what constitutes an article (and some vigilance to make 
sure people are not inserting crap into pages just to meet this definition). 
The current definition is far too liberal is this regard -- all you need is 
one blasted comma and the page, no matter how short and useless, is counted 
as an article. 

I vote that in addition to the already established criteria on what 
constitutes an article for automatic detection we add at least this 
additional criteria; no page less than x  bytes is counted as being an 
article. That way stubs won't screw up our stats.

I have suggested that this number can be as low as 500 bytes, others have 
suggested the minimum size should be set at 1000 bytes. I now think that the 
figure should be somewhere in between. But we need to establish this and 
announce it /before/ we hit 50,000 "articles" under our current definition. 
Lord knows we will probably be Slashdotted again after reaching our "50%" 
point.

Be prepared to see a dramatic reduction in the total article count though. If 
we go with the 500 byte cutoff it could be reset to 32,000 articles. If we go 
nuts and set the cutoff at 1000 bytes the total count would drop to 22,000 
(if I'm reading Wikipedia:Statistics right).

Example of an article of 500 bytes;
http://www.wikipedia.org/wiki/Amora

Example of an article of 776 bytes; 
http://www.wikipedia.org/wiki/KWord

Example of an article of 1000 bytes;
http://www.wikipedia.org/wiki/Aqueduct

Previous talk on this issue is at;
http://www.wikipedia.org/wiki/Wikipedia_talk:Statistics

> I vote for simply removing this "100,000 article" sentence.
>
> Axel

I vote for this too -- it is a rather arbitrary number and if anything should 
be just a milestone (so that we can say we have more articles than 
Britannica). However, that figure will have little meaning if;
"In [[Greek mythology]], '''Cedalion''' was [[Hephaestus]]' servant." is 
considered to be an article just because there is a comma. At best, that is a 
sad stub definition. 

-- Daniel Mayer (aka mav)



More information about the Wikipedia-l mailing list