[Wikipedia-l] "The God-King drives a Hyundai"

David Gerard fun at thingy.apana.org.au
Sat Feb 19 01:23:08 UTC 2005


Ray Saintonge (saintonge at telus.net) [050219 11:22]:
> David Gerard wrote:
 
> >An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are
> >there? Time to buy a 1000-stack of DVD-Rs ;-)

> The 1911EB has 29 volumes, of which the last is an index.  Each volume 
> has about 1,000 pages.  Add three volumes for the supplements, and we 
> have a mere 32,000 pages.  David, was your estimate based on colour 
> scanning?  Wouldn't monochrome scanning take less space?  There are very 
> few colour pages.


Greyscale (one byte per pixel, 210 x 297 / 2.54 / 2.54 * 300 * 300 pixels).
You could probably reduce it to four bits per pixel. I wouldn't suggest
going to three. I did this last year scanning in a pile of stuff.


> The online images is an interesting proposal.  Not long ago on the list 
> there was mention of a Swedish project that includes both a scanned and 
> OCR version of a page.  A scanned version is helpful for maintaining the 
> integrity of a text; a character recognized is better for applying 
> search functions and annotations.  The French Gallica collection in pdf 
> can be a tremendous resource, but is difficult to use.  There are some 
> interesting points to be explored, such as how much can the system handle.
 

Project Gutenberg and Distributed Proofreaders have web-based software that
apparently does the job well. Scan on one side, OCR on the other, correct.
Do one page at a time, highly parallelisable.


> The idea of having a bunch of volunteers working away in public view at 
> Wikimania to put something of the size of EB12 on line has great 
> publicity appeal, especially if these volunteers are at it round the 
> clock.  Whether the EB should be the only work treated that way at 
> Wikimania should remain an open question.  Perhaps the scanned works 
> should be in several languages. :-)


Could be very nice. And very geeky. And get EB to say nasty things about us
again ;-)


- d.






More information about the Wikipedia-l mailing list