On 11/08/2012 01:53 PM, Ole Palnatoke Andersen wrote:
The book has been digitized now. I can see it at
http://www.kb.dk/e-mat/dod/130019427200.pdf
Looking at this file, pdfinfo says it was created
by Finereader Recognition Server and that the
page size is 595.44 × 841.68 pts. If "pts" is 1/72
of an inch, this would mean 8.27 × 11.69 inches,
close to letter size, which is clearly unrealistic.
I would guess that the pages of the physical
book is roughly half of that or 4-5 × 6-7 inches.
The images are 1170 × 1873 pixels, and I would
estimate the scanning resolution to be in the
range 250 to 300 dpi. That's good enough.
The included OCR text looks like this for a text
page (page 20 of the PDF, paginated -12-):
ventede den, men meente jeg forstoed ncr-
sten alt Norsk, fisen jeg t mine yngre Aar
ogsaa havde varet her nogen tort Tltd,
og da langt lcrttere kom til rette t daglig
Samtale. Imidlertid blev jkg snart vaec
at Adflilligheven beroede paa den Kster
og Vesterlandske äisleÄcs heel store og
It's not bad that it read "forstoed", with
the long "s" and the old spelling "oe".
But on the second line "siden" was read
as "fisen", which is incorrect. Not a
single "æ" is correct, which is odd for
trying to recognize Danish, while "ä"
erroneously appears on the last line
of this excerpt. This indicates that it
really tries to recognize the German
alphabet, albeit with a Danish dictionary.
This is "the usual quality" for OCR of
blackletter (fraktur), and not radically
good.
I uploaded the work to
http://runeberg.org/glossnor/
with the OCR text provided.
It's now ready for your proofreading.
--
Lars Aronsson (lars(a)aronsson.se)
Project Runeberg - free Nordic literature -
http://runeberg.org/