[Wikipedia-l] "The God-King drives a Hyundai"

Lars Aronsson lars at aronsson.se
Sat Feb 19 08:18:56 UTC 2005


Ray Saintonge wrote:
> > An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many
> > pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
> >  
> The 1911EB has 29 volumes, of which the last is an index.  Each
> volume has about 1,000 pages.  Add three volumes for the
> supplements, and we have a mere 32,000 pages.  David, was your
> estimate based on colour scanning?  Wouldn't monochrome scanning
> take less space?  There are very few colour pages.

Sorry, this discussion is painfully clueless.  If you really want to,
you can learn the basics of document imaging on your own in a week by
using Google.  Get a scanner, an OCR program, download some free
software and start to play around.

Digitizing encyclopedias can be done.  No magic.  A typical volume of
800 pages might take 160 megabytes in images and 5 megabytes in plain
text.  You will want to cut the spine off the books and use a sheet
feeding scanner.  No sweat.  Old encyclopedias are cheap on Ebay or in
your local second hand shop, since no sane person would buy one and 
the insane usually have less money.

However, I doubt that this should be a part of Wikipedia / Wikimedia.  
Distributed Proofreaders and the Internet Archive are already doing
important parts of what you ask for.  I think Wikipedia should consume
the results, not produce them.

For example, while digitizing yet another year run (1893) of a 
Norwegian engineering journal the other day, I found a nice 
illustration http://runeberg.org/tekuke/1893/0161.html
that I cut out and uploaded to the Wikimedia Commons, and used in
http://en.wikipedia.org/wiki/Architect

Instead of inventing the wheel all over again, perhaps you should get 
involved in Distributed Proofreaders and help improve their system.

> The online images is an interesting proposal.  Not long ago on the
> list there was mention of a Swedish project that includes both a
> scanned and OCR version of a page.

Hi there!


-- 
  Lars Aronsson (lars at aronsson.se)
  Project Runeberg - free Nordic literature - http://runeberg.org/



More information about the Wikipedia-l mailing list