Hi,

On 11 August 2016 at 15:10, Mardetanha <mardetanha.wiki@gmail.com> wrote:
best person would be Bodhisattwa Mandal

Mardetanha

Aaaaa, I doubt that. I have not contributed in Hindi Wikisource (which is still in multilingual Wikisource) ever.
 

On Thu, Aug 11, 2016 at 2:10 AM, John Mark Vandenberg <jayvdb@gmail.com> wrote:
---------- Forwarded message ----------
From: Lane Rasberry <lane@bluerasberry.com>
Date: Thu, Aug 11, 2016 at 4:38 AM
Subject: [Wikimediaindia-l] seeking help with Hindi projects in Wikisource...
To: Wikimedia India Community list <wikimediaindia-l@lists.wikimedia.org>


Hello,

Can anyone here refer me to someone who is active in making
Hindi-language contributions to Wikisource? I wish to meet someone
with experience in that language and project. Otherwise, can anyone
suggest to me which Indic languages in Wikisource seem to be most
active?

I dont know anyone personally, who contributes in Hindi Wikisource, but User:Sfic may be the person you are looking for. Recent contribution history shows his username, so he is active now. But as I said, I dont know him personally.
 

Is anyone able to make a recommendation for any OCR software for
converting scanned Hindi language documents to digital text? Does
anyone know anything about in-Wikisource support for OCR in Hindi
language? Does it exist? Is there documentation?

Thanks for anything anyone can share.


Yes, I can recommend for this one.

For majority of Indic languages, including Hindi, Google OCR [1] is the only available option till now. We have tested and used it for Sanskrit Wikisource and it gives good result. As both the languages use the same Devanagari script, then it will work for Hindi too.

Obviously, the other best option is to train the Tessaract OCR [2] for Hindi, but it will take time. There is also a trained data [3] existing from Aug 2014. I dont know about its output result.

Also, ABBYY dont support Hindi [4]

[1] https://support.google.com/drive/answer/176692?hl=en
[2] https://github.com/tesseract-ocr/tesseract
[3] https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata
[4] https://www.abbyy.com/support/finereader/12/rl/

I hope this helps,

Regards,
--
Bodhisattwa