Just to remarck that IA OCR is excellent - but is eavy limited by poor scan
quality, since Google shares online bad scans (I presume, Google saves much
better scans for internal use :-) ). This is why IMHO the most efficient
procedure to have a good OCR for free is, simply to upload into IA an
excellent pdf from TIFF-saved scans, then wait briefly for output.
What is to be discouraged is, to upload directly low quality pdfs from
Google, to transform them into low quality djvu, and to use FineReader 10
or 11 on them: there's presently no way to get abbyy.xml file by FineReader
10 or 11. Even qurking with low quality pdf by Google, presently the best
option is to upload them into IA; can be that character recognition can be
obtained from FineReader 10 or 11, but the best obtained from FineReader
11 is a structured,mapped djvu text layer by djvu exportation, while all
the remaining formatting (font size, bold, uncertainty of words) is lost.
2013/6/17 Andrea Zanni <zanni.andrea84(a)gmail.com>
On Mon, Jun 17, 2013 at 10:12 AM, Lars Aronsson <lars(a)aronsson.se> wrote:
Both the Internet Archive
and Wikisource volunteers use a cheap, commercial
version of ABBYY Finereader and we have no
dialogue with that company. And why should they
listen to us? We have no more money to provide,
but Google does pay its OCR software developers.
I actually had a contact with a ABBYY Finereader sales manager,
but after a short conversation in this list I didn't follow up,
as the community was not enthusiastic about that, and I was worried about
amount of money they could request us.
Wikisource-l mailing list