Ray Saintonge (saintonge(a)telus.net) [050219 11:22]:
David Gerard wrote:
>An A4 page at 300dpi is 8.7 MB; at 600dpi it's
34.8 MB. How many pages are
>there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an
index. Each volume
has about 1,000 pages. Add three volumes for the supplements, and we
have a mere 32,000 pages. David, was your estimate based on colour
scanning? Wouldn't monochrome scanning take less space? There are very
few colour pages.
Greyscale (one byte per pixel, 210 x 297 / 2.54 / 2.54 * 300 * 300 pixels).
You could probably reduce it to four bits per pixel. I wouldn't suggest
going to three. I did this last year scanning in a pile of stuff.
The online images is an interesting proposal. Not
long ago on the list
there was mention of a Swedish project that includes both a scanned and
OCR version of a page. A scanned version is helpful for maintaining the
integrity of a text; a character recognized is better for applying
search functions and annotations. The French Gallica collection in pdf
can be a tremendous resource, but is difficult to use. There are some
interesting points to be explored, such as how much can the system handle.
Project Gutenberg and Distributed Proofreaders have web-based software that
apparently does the job well. Scan on one side, OCR on the other, correct.
Do one page at a time, highly parallelisable.
The idea of having a bunch of volunteers working away
in public view at
Wikimania to put something of the size of EB12 on line has great
publicity appeal, especially if these volunteers are at it round the
clock. Whether the EB should be the only work treated that way at
Wikimania should remain an open question. Perhaps the scanned works
should be in several languages. :-)
Could be very nice. And very geeky. And get EB to say nasty things about us
again ;-)
- d.