Thank you for your mention of Google OCR gadget, I didn't know it; I'll test it for sure, even inf I'm far from happy to became dependent from Google services.
Alex
Il giorno ven 12 lug 2019 alle ore 09:26 David Starner prosfilaes@gmail.com ha scritto:
On Thu, Jul 11, 2019 at 11:22 PM Alex Brollo alex.brollo@gmail.com wrote:
I don't understand fully your statement "Right now, I'm going to convert
them to DjVu and upload them, without any text information.". Don't you feel any need of an excellent OCR layer when proofreading it into wikisource?
I reuploaded the first issue of Weird Tales in DjVu because the PDF was significantly fuzzier than the DjVu, and looking at the PDF OCR, it's slightly better than what I can get from the interface. Given the choice between better images and better OCR, I go with the first one.
Do you feel fully satisfied by mediawiki OCR of images?
I can't even get the MediaWiki OCR to work. I use the Google OCR gadget.
I don't know how to get xml data about mapping of words into page image.
It's a pretty distant concern for me, somewhat tangential to producing transcriptions of the works.
-- Kie ekzistas vivo, ekzistas espero.
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l