Hi,

I don't know where things are with OCR for non-latin scripts, so maybe this is not relevant anymore. Last time I grabbed information about it, there was limitation with the google service which was a problem namely for Indic languages. Well, yesterday we had a contribution day around Alsatian and Franconian dialects where I had the opportunity to talk with some linguists. One of them told me that google was in fact using tesseract for its OCR service, which is open source. According to what she told me (or at least what I remember from this), it works with a trans-script training machine, you have to define matching between picture sample and character and there it goes. Looking quickly at the langdata repository I see that there are stuff about Devenagari, which I believe is a script used in at least a part of Indic texts, isn't it?

Hope that may help,
mathieu