Re: [Wikisource-l] OCR as a service?

12 Jul 2015

      On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni zanni.andrea84@gmail.com
wrote:
...
uh, that sounds very interesting.
Right now, we mainly use OCR from djvu from Internet Archive (that means
ABBYY Finereader, which is very nice).
Yes, the output is generally good.  But as far as I can tell, the archive's
Open Library API does not offer a way to retrieve the OCR output
programmatically, and certainly not for an arbitrary page rather than the
whole item.  What I'm working on requires the ability to OCR a single page
on demand.
But ideally we could think of a "customizable" OCR software that gets
...
trained language per language: htat would be extremely useful for
Wiikisources.
(i can also imagine to divide, inside every language, per centuries,
because languages too changes over time ;-)
Indeed.
A.
-- 
    Asaf Bartov
    Wikimedia Foundation http://www.wikimediafoundation.org

Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
https://donate.wikimedia.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] OCR as a service?