Ray Saintonge (saintonge@telus.net) [050219 11:22]:
David Gerard wrote:
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an index. Each volume has about 1,000 pages. Add three volumes for the supplements, and we have a mere 32,000 pages. David, was your estimate based on colour scanning? Wouldn't monochrome scanning take less space? There are very few colour pages.
Greyscale (one byte per pixel, 210 x 297 / 2.54 / 2.54 * 300 * 300 pixels). You could probably reduce it to four bits per pixel. I wouldn't suggest going to three. I did this last year scanning in a pile of stuff.
The online images is an interesting proposal. Not long ago on the list there was mention of a Swedish project that includes both a scanned and OCR version of a page. A scanned version is helpful for maintaining the integrity of a text; a character recognized is better for applying search functions and annotations. The French Gallica collection in pdf can be a tremendous resource, but is difficult to use. There are some interesting points to be explored, such as how much can the system handle.
Project Gutenberg and Distributed Proofreaders have web-based software that apparently does the job well. Scan on one side, OCR on the other, correct. Do one page at a time, highly parallelisable.
The idea of having a bunch of volunteers working away in public view at Wikimania to put something of the size of EB12 on line has great publicity appeal, especially if these volunteers are at it round the clock. Whether the EB should be the only work treated that way at Wikimania should remain an open question. Perhaps the scanned works should be in several languages. :-)
Could be very nice. And very geeky. And get EB to say nasty things about us again ;-)
- d.