Hi,
I don't know where things are with OCR for non-latin scripts, so
maybe this is not relevant anymore. Last time I grabbed information
about it, there was limitation with the google service which was a
problem namely for Indic languages. Well, yesterday we had a contribution
day around Alsatian and Franconian dialects where I had the
opportunity to talk with some linguists. One of them told me that
google was in fact using tesseract
for its OCR service, which is open source. According to what she
told me (or at least what I remember from this), it works with a
trans-script training machine, you have to define matching between
picture sample and character and there it goes. Looking quickly at
the langdata repository I see that there are stuff about Devenagari,
which I believe is a script used in at least a part of Indic texts,
isn't it?
Hope that may help,
mathieu