[Wikipedia-l] Estimating article numbers

Larry Sanger lsanger at nupedia.com
Thu Jun 28 13:48:14 UTC 2001


(I have posted this on [[Larry Sanger/Estimating article numbers]].)

So Wikipedia has almost 10,000 pages.  This represents a heck of a lot of
work and a heck of a lot of content, and we can all be proud, but...I think
there's a bit of a problem.  I have a puzzle for you: how many of these
pages are *articles*?

''We'' all know "10,000 pages" does not mean "10,000 encyclopedia articles."
There are a lot of redirection pages, Talk pages, member pages and subpages,
commentary pages, Wikipedia project pages, and other non-articles.  But the
new reader doesn't know this, and if some news media source comes along (as
they inevitably will--it's only a matter of time now), I think we might be
blasted for misrepresenting the extent of our achievement.  Not only would
that be shameful, it lose us the participation of potential new contributors
who ''care'' about how accurately we represent our achievement, though they
don't care if we say we have 10,000 or, instead, a mere 6,000 articles.  :-)

I wouldn't envy anyone the task of counting the ''actual'' number of
articles.  But we could estimate the number.

Anyone care to give it a shot, and report the results?

For purposes of this exercise, I don't think we need to draw a distinction
between one-sentence articles to the effect that so-and-so was a famous
novelist, and treatise-length pages.  Both of those can, for our purposes,
be called articles, or perhaps "entries."  What ''can't'' be called articles
are:
* redirection pages
* Talk pages
* member pages and subpages thereof
* pages describing the Wikipedia project (e.g., the FAQ, news,
announcements, etc.)
* pages that consist *only* of links to other articles, with virtually no
content of their own
* any other categories?

What I'd like to do with your estimate is to make the assumption that the
''ratio'' of present pages to present articles will remain roughly the same
for the next 5,000 or so ''pages.''  (Or perhaps you can tell me how long
the ratio can probably be relied upon.)

Then, we can (honestly) boast "over 5,000 articles" (notice, articles, not
pages) or "over 5,000 entries" on the front page.  This will make our work
seem more substantial, more real, and that's important if we're going to
make this a reasonably serious project.

I'll bet the present number is right around 5,000, but I really don't know!

Larry





More information about the Wikipedia-l mailing list