Wow, thank you all for the quick responses.
I'll try to reply in-line.
2011/11/28 Mathias Schindler <mathias.schindler(a)gmail.com>
I recommend sticking and supporting open source
technology that has
been made available by third parties, such as
This is true, and this would be the optimal way, but apparently it failed.
I don't know way the OCR button is not running anymore, it seems to me that
left things were not updated or something like that.
From my experience (I have used these software for
the quality and the usability
of the software is very different. Of course, having Tesseract is better
than having nothing.
2011/11/28 Lars Aronsson <mathias.schindler(a)gmail.com>
I think this is what the Internet Archive uses, as
several European libraries. We could look into establishing
a cooperation with the Internet Archive or perhaps with
Europeana in this area. Maybe the Internet Archive can
open up an API for OCR-ing a single page at a time?
This would be awesome :-)
I don't have a clue about technicalities here, if you want to aske them be
my guest :-)
i think that Federico has a point in the approach he suggested:
I'm wondering in fact how did IA get his license, we should ask them.
Do we have any contact with Internet Archive?
I know we could use directly IA for uploading PDFs (we do it already for
but still it's not the more usable way to handle with institutions or