[Wikisource-l] Re: OCR in 2023

15 Jan 2023


      சனி, 14 ஜன., 2023, பிற்பகல் 3:20 அன்று, Lars Aronsson
lars@aronsson.se எழுதியது:
...
OCR is an old problem. There is commercial software, such as Finereader,
and free software such as Tesseract. But is there also a new trend in
home-built software based on new frameworks for neural networks and
deep learning? Keras? TensorFlow? Is anybody experimenting with this
for OCR of scanned books?
When I ask researchers in image processing / computer vision, they
say that plain text (book) OCR "is a solved problem" that nobody
researches, and all research goes into self-driving cars reading
street signs. Is this true, or are there any exceptions?
Yes. The recent tesseract is doing good on OCR.
It uses machine learning technologies to train and giving better
results with recent versions.
We have its improved proprietary version as google vision api, which
provides little better results sometimes.
Here is a implementation of connecting wikisource and OCR via google
drive. ( wrote this on 2015)
https://github.com/tshrinivasan/OCR4wikisource
We used it many indic wikisource sites around 2016-2020.
-- 
Regards,
T.Shrinivasan


My Life with GNU/Linux : http://goinggnu.wordpress.com
Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com

Get Free Tamil Ebooks for Android, iOS, Kindle, Computer :
http://FreeTamilEbooks.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikisource-l] Re: OCR in 2023