On Sun, Jul 12, 2015 at 11:25 AM, Asaf Bartov <abartov@wikimedia.org> wrote:

On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni <zanni.andrea84@gmail.com> wrote:
uh, that sounds very interesting.
Right now, we mainly use OCR from djvu from Internet Archive (that means ABBYY Finereader, which is very nice).

Yes, the output is generally good. But as far as I can tell, the archive's Open Library API does not offer a way to retrieve the OCR output programmatically, and certainly not for an arbitrary page rather than the whole item. What I'm working on requires the ability to OCR a single page on demand.

True.

I've recently met Giovanni, a new (italian) guy who's now working with Internet Archive and Open Library.

We discussed about a number of possible parnerships/projects, this is definitely one to bring it up.

But if we manage to do it directly in the Wikimedia world it's even better.

Aubrey

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l