Hi,
On 11 August 2016 at 15:10, Mardetanha <mardetanha.wiki(a)gmail.com> wrote:
best person would be Bodhisattwa Mandal
Mardetanha
Aaaaa, I doubt that. I have not contributed in Hindi Wikisource (which is
still in multilingual Wikisource) ever.
On Thu, Aug 11, 2016 at 2:10 AM, John Mark Vandenberg <jayvdb(a)gmail.com>
wrote:
---------- Forwarded message ----------
From: Lane Rasberry <lane(a)bluerasberry.com>
Date: Thu, Aug 11, 2016 at 4:38 AM
Subject: [Wikimediaindia-l] seeking help with Hindi projects in
Wikisource...
To: Wikimedia India Community list <wikimediaindia-l(a)lists.wikimedia.org>
Hello,
Can anyone here refer me to someone who is active in making
Hindi-language contributions to Wikisource? I wish to meet someone
with experience in that language and project. Otherwise, can anyone
suggest to me which Indic languages in Wikisource seem to be most
active?
I dont know anyone personally, who contributes in Hindi Wikisource, but
User:Sfic may be the person you are looking for. Recent contribution
history shows his username, so he is active now. But as I said, I dont know
him personally.
> Is anyone able to make a recommendation for any OCR software for
> converting scanned Hindi language documents to digital text? Does
> anyone know anything about in-Wikisource support for OCR in Hindi
> language? Does it exist? Is there documentation?
>
> Thanks for anything anyone can share.
>
>
Yes, I can recommend for this one.
For majority of Indic languages, including Hindi, Google OCR [1] is the
only available option till now. We have tested and used it for Sanskrit
Wikisource and it gives good result. As both the languages use the same
Devanagari script, then it will work for Hindi too.
Obviously, the other best option is to train the Tessaract OCR [2] for
Hindi, but it will take time. There is also a trained data [3] existing
from Aug 2014. I dont know about its output result.
Also, ABBYY dont support Hindi [4]
[1]
https://support.google.com/drive/answer/176692?hl=en
[2]
https://github.com/tesseract-ocr/tesseract
[3]
https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata
[4]
https://www.abbyy.com/support/finereader/12/rl/
I hope this helps,
Regards,
--
Bodhisattwa