Re: [Wikisource-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?

28 Nov 2011

On 11/28/2011 01:59 PM, Mathias Schindler wrote:
...
  I recommend sticking and supporting open source
technology that has
 been made available by third parties, such as
 http://code.google.com/p/ocropus/ /
 http://code.google.com/p/tesseract-ocr/ 
Do you recommend this based on experience, or based on free software
ideology? Apparently the Internet Archive tried and gave up, because
Finereader was far better. Are there any good examples where free
software has been used for good OCR quality?

Wikisource does provide feedback on quality: After OCR, when a page
has been proofread, the OCR software could learn from the diff.
But is there any OCR software that can take this kind of input?

When running OCR as an engine/server/API, what do we do when it
misinterprets columns in a page, and reads long lines across the
page? Is there a way to manually indicate where columns are, and
resubmit the page for new OCR?

-- 
   Lars Aronsson (lars(a)aronsson.se)
   Aronsson Datateknik - http://aronsson.se

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?