[Wikipedia-l] "The God-King drives a Hyundai"
David Gerard
fun at thingy.apana.org.au
Sat Feb 19 01:23:08 UTC 2005
Ray Saintonge (saintonge at telus.net) [050219 11:22]:
> David Gerard wrote:
> >An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are
> >there? Time to buy a 1000-stack of DVD-Rs ;-)
> The 1911EB has 29 volumes, of which the last is an index. Each volume
> has about 1,000 pages. Add three volumes for the supplements, and we
> have a mere 32,000 pages. David, was your estimate based on colour
> scanning? Wouldn't monochrome scanning take less space? There are very
> few colour pages.
Greyscale (one byte per pixel, 210 x 297 / 2.54 / 2.54 * 300 * 300 pixels).
You could probably reduce it to four bits per pixel. I wouldn't suggest
going to three. I did this last year scanning in a pile of stuff.
> The online images is an interesting proposal. Not long ago on the list
> there was mention of a Swedish project that includes both a scanned and
> OCR version of a page. A scanned version is helpful for maintaining the
> integrity of a text; a character recognized is better for applying
> search functions and annotations. The French Gallica collection in pdf
> can be a tremendous resource, but is difficult to use. There are some
> interesting points to be explored, such as how much can the system handle.
Project Gutenberg and Distributed Proofreaders have web-based software that
apparently does the job well. Scan on one side, OCR on the other, correct.
Do one page at a time, highly parallelisable.
> The idea of having a bunch of volunteers working away in public view at
> Wikimania to put something of the size of EB12 on line has great
> publicity appeal, especially if these volunteers are at it round the
> clock. Whether the EB should be the only work treated that way at
> Wikimania should remain an open question. Perhaps the scanned works
> should be in several languages. :-)
Could be very nice. And very geeky. And get EB to say nasty things about us
again ;-)
- d.
More information about the Wikipedia-l
mailing list