Hi,
I don't know where things are with OCR for non-latin scripts, so maybe
this is not relevant anymore. Last time I grabbed information about it,
there was limitation with the google service which was a problem namely
for Indic languages. Well, yesterday we had a contribution day around
Alsatian and Franconian dialects
<https://fr.wikipedia.org/wiki/Discussion_Projet:Alsace#Journ.C3.A9e_contributive_alsacien.2Ffrancique_20_avril_2016>
where I had the opportunity to talk with some linguists. One of them
told me that google was in fact using tesseract
<https://github.com/tesseract-ocr> for its OCR service, which is open
source. According to what she told me (or at least what I remember from
this), it works with a trans-script training machine, you have to define
matching between picture sample and character and there it goes. Looking
quickly at the langdata repository I see that there are stuff about
Devenagari, which I believe is a script used in at least a part of Indic
texts, isn't it?
Hope that may help,
mathieu
Show replies by thread