[Wikisource-l] Assessing OCR quality

12 Mar 2019


      If you have a large digitization project, such as Wikisource,
with many pages and books of scanned images and OCR text
(originating from different sources and times),
how do you assess the OCR quality and determine which pages
are in most need of improved OCR or proofreading?
Is spell checking (and a normal dictionary) the only useful tool?
Would you count the number of spelling errors, or the ratio
of errors to correct words? Has anyone done this?
-- 
   Lars Aronsson (lars@aronsson.se)
   Project Runeberg - free Nordic literature - http://runeberg.org/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikisource-l] Assessing OCR quality