Ray Saintonge wrote:
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an index. Each volume has about 1,000 pages. Add three volumes for the supplements, and we have a mere 32,000 pages. David, was your estimate based on colour scanning? Wouldn't monochrome scanning take less space? There are very few colour pages.
Sorry, this discussion is painfully clueless. If you really want to, you can learn the basics of document imaging on your own in a week by using Google. Get a scanner, an OCR program, download some free software and start to play around.
Digitizing encyclopedias can be done. No magic. A typical volume of 800 pages might take 160 megabytes in images and 5 megabytes in plain text. You will want to cut the spine off the books and use a sheet feeding scanner. No sweat. Old encyclopedias are cheap on Ebay or in your local second hand shop, since no sane person would buy one and the insane usually have less money.
However, I doubt that this should be a part of Wikipedia / Wikimedia. Distributed Proofreaders and the Internet Archive are already doing important parts of what you ask for. I think Wikipedia should consume the results, not produce them.
For example, while digitizing yet another year run (1893) of a Norwegian engineering journal the other day, I found a nice illustration http://runeberg.org/tekuke/1893/0161.html that I cut out and uploaded to the Wikimedia Commons, and used in http://en.wikipedia.org/wiki/Architect
Instead of inventing the wheel all over again, perhaps you should get involved in Distributed Proofreaders and help improve their system.
The online images is an interesting proposal. Not long ago on the list there was mention of a Swedish project that includes both a scanned and OCR version of a page.
Hi there!