-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Google's OCR which apparently is most accurate OCR we have seen so far, works really good for all the major South Asian scripts: http://globalvoicesonline.org/2015/08/29/googles-optical-character-recog nition-software-now-works-with-all-south-asian-languages Here are test cases of many Indian scripts: https://goo.gl/3X75iR. Except Gurmukhi most scripts are working really good.
This could be really useful for Indian language Wikimedians and will come handy for digitization of printed and scanned text. Here is an animated tutorial for Wikimedians to use this tool for Wikisource/Wikipedia: https://commons.wikimedia.org/wiki/File:Tutorial_to_use_Google_Optical_C haracter_Recognition.gif
Please write to me if anyone wants to localize this tutorial in your language.
- -- Best! Subhashish Panigrahi Programme Officer, Access To Knowledge Centre for Internet and Society @subhapa / https://cis-india.org
wikimediaindia-l@lists.wikimedia.org