Re: [Wikisource-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it? - Wikisource-l

28 Nov 2011

Wow, thank you all for the quick responses.
I'll try to reply in-line.

2011/11/28 Mathias Schindler &lt;mathias.schindler(a)gmail.com&gt;

...
  I recommend sticking and supporting open source
technology that has
 been made available by third parties, such as
 http://code.google.com/p/ocropus/ /
 http://code.google.com/p/tesseract-ocr/

This is true, and this would be the optimal way, but apparently it failed.
I don't know way the OCR button is not running anymore, it seems to me that
when ThomasV
left things were not updated or something like that.
...
 From my experience (I have used these software for
professional projects) the quality and the usability
of the software is very different. Of course, having Tesseract is better
than having nothing.

2011/11/28 Lars Aronsson &lt;mathias.schindler(a)gmail.com&gt;
...
  I think this is what the Internet Archive uses, as
well as
 several European libraries. We could look into establishing
 a cooperation with the Internet Archive or perhaps with
 Europeana in this area. Maybe the Internet Archive can
 open up an API for OCR-ing a single page at a time?

This would be awesome :-)
I don't have a clue about technicalities here, if you want to aske them be
my guest :-)

@Tomasz
i think that Federico has a point in the approach he suggested:
I'm wondering in fact how did IA get his license, we should ask them.
Do we have any contact with Internet Archive?

I know we could use directly IA for uploading PDFs (we do it already for
getting DjVus)
but still it's not the more usable way to handle with institutions or
simple users...

Aubrey