Just to remarck that IA OCR is excellent - but is eavy limited by poor scan quality, since Google shares online bad scans (I presume, Google saves much better scans for internal use :-) ). This is why IMHO the most efficient procedure to have a good OCR for free is, simply to upload into IA an excellent pdf from TIFF-saved scans, then wait briefly for output.

What is to be discouraged is, to upload directly low quality pdfs from Google, to transform them into low quality djvu, and to use FineReader 10 or 11 on them: there's presently no way to get abbyy.xml file by FineReader 10 or 11. Even qurking with low quality pdf by Google, presently the best option is to upload them into IA; can be that character recognition can be obtained from FineReader 10 or 11, but the best obtained from FineReader 11 is a structured,mapped djvu text layer by djvu exportation, while all the remaining formatting (font size, bold, uncertainty of words) is lost.

Alex

2013/6/17 Andrea Zanni <zanni.andrea84@gmail.com>

On Mon, Jun 17, 2013 at 10:12 AM, Lars Aronsson <lars@aronsson.se> wrote:

Both the Internet Archive
and Wikisource volunteers use a cheap, commercial
version of ABBYY Finereader and we have no
dialogue with that company. And why should they
listen to us? We have no more money to provide,
but Google does pay its OCR software developers.

I actually had a contact with a ABBYY Finereader sales manager,
but after a short conversation in this list I didn't follow up,

as the community was not enthusiastic about that, and I was worried about the
amount of money they could request us.

Aubrey

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l