[WikiEN-l] Cool project for perl programmer

Jimmy Wales jwales at bomis.com
Sat Feb 28 17:08:13 UTC 2004


To compare Wikipedia to Columbia Encyclopedia...

http://www.encyclopedia.com/

has the full text of Columbia.

There are pages for alphabetic browsing.

http://www.encyclopedia.com/browse/browse-Aa.asp

>From these pages, it should be possible to get a list of all their
article titles.

These could be matched up against Wikipedia article titles.

Then we could ask the hypothetical: suppose Wikipedia just snagged the
same 55,000 topics as Columbia?  How big would the resulting text be?

If the answer is in the ballpark of 6,500,000 words -- i.e. the same
size as Columbia - then we have an obvious strategy.  If, as I would
imagine, the answer is that we're bigger, then we can start digging
into how many of our longer articles would have to be edited down in
order to hit the same "ballpark".

Note that we don't have an answer from a publisher as to how big we
can be.  The guy I talked to expressed a desire to be "as big as
possible" but I warned him that that's a limitation that's going to
come from their end, not ours, because we're already bigger than
Britannica, so our issue is how to get *small enough*, not how to
produce *enough*.

--Jimbo



More information about the WikiEN-l mailing list