Re: [Wikisource-l] ABBYY xml files: any of you is working about?

17 Jun 2013

Just to remarck that IA OCR is excellent - but is eavy limited by poor scan
quality, since Google shares online bad scans (I presume, Google saves much
better scans for internal use :-) ). This is why IMHO the most efficient
procedure to have a good OCR for free is, simply to upload into IA an
excellent pdf from TIFF-saved scans, then wait briefly for output.

What is to be discouraged is, to upload directly low quality pdfs from
Google, to transform them into low quality djvu, and to use FineReader 10
or 11 on them: there's presently no way to get abbyy.xml file by FineReader
10 or 11. Even qurking with low quality pdf by Google, presently the best
option is to upload them into IA; can be that character recognition can be
 obtained from FineReader 10 or 11, but the best obtained from FineReader
11 is a structured,mapped djvu text layer by djvu exportation, while all
the remaining formatting (font size, bold, uncertainty of words) is lost.

Alex

2013/6/17 Andrea Zanni &lt;zanni.andrea84(a)gmail.com&gt;

...

 On Mon, Jun 17, 2013 at 10:12 AM, Lars Aronsson &lt;lars(a)aronsson.se&gt; wrote:

  Both the Internet Archive
 and Wikisource volunteers use a cheap, commercial
 version of ABBYY Finereader and we have no
 dialogue with that company. And why should they
 listen to us? We have no more money to provide,
 but Google does pay its OCR software developers.

 I actually had a contact with a ABBYY Finereader sales manager,
 but after a short conversation in this list I didn't follow up,
 as the community was not enthusiastic about that, and I was worried about
 the
 amount of money they could request us.

 Aubrey

 _______________________________________________
 Wikisource-l mailing list
 Wikisource-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] ABBYY xml files: any of you is working about?