---------- Forwarded message ---------- From: Lane Rasberry lane@bluerasberry.com Date: Thu, Aug 11, 2016 at 4:38 AM Subject: [Wikimediaindia-l] seeking help with Hindi projects in Wikisource... To: Wikimedia India Community list wikimediaindia-l@lists.wikimedia.org
Hello,
Can anyone here refer me to someone who is active in making Hindi-language contributions to Wikisource? I wish to meet someone with experience in that language and project. Otherwise, can anyone suggest to me which Indic languages in Wikisource seem to be most active?
Is anyone able to make a recommendation for any OCR software for converting scanned Hindi language documents to digital text? Does anyone know anything about in-Wikisource support for OCR in Hindi language? Does it exist? Is there documentation?
Thanks for anything anyone can share.
yours,
-- Lane Rasberry user:bluerasberry on Wikipedia 206.801.0814 lane@bluerasberry.com
_______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
best person would be Bodhisattwa Mandal
Mardetanha
On Thu, Aug 11, 2016 at 2:10 AM, John Mark Vandenberg jayvdb@gmail.com wrote:
---------- Forwarded message ---------- From: Lane Rasberry lane@bluerasberry.com Date: Thu, Aug 11, 2016 at 4:38 AM Subject: [Wikimediaindia-l] seeking help with Hindi projects in Wikisource... To: Wikimedia India Community list wikimediaindia-l@lists.wikimedia.org
Hello,
Can anyone here refer me to someone who is active in making Hindi-language contributions to Wikisource? I wish to meet someone with experience in that language and project. Otherwise, can anyone suggest to me which Indic languages in Wikisource seem to be most active?
Is anyone able to make a recommendation for any OCR software for converting scanned Hindi language documents to digital text? Does anyone know anything about in-Wikisource support for OCR in Hindi language? Does it exist? Is there documentation?
Thanks for anything anyone can share.
yours,
-- Lane Rasberry user:bluerasberry on Wikipedia 206.801.0814 lane@bluerasberry.com
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
-- John Vandenberg
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Hi,
On 11 August 2016 at 15:10, Mardetanha mardetanha.wiki@gmail.com wrote:
best person would be Bodhisattwa Mandal
Mardetanha
Aaaaa, I doubt that. I have not contributed in Hindi Wikisource (which is still in multilingual Wikisource) ever.
On Thu, Aug 11, 2016 at 2:10 AM, John Mark Vandenberg jayvdb@gmail.com wrote:
---------- Forwarded message ---------- From: Lane Rasberry lane@bluerasberry.com Date: Thu, Aug 11, 2016 at 4:38 AM Subject: [Wikimediaindia-l] seeking help with Hindi projects in Wikisource... To: Wikimedia India Community list wikimediaindia-l@lists.wikimedia.org
Hello,
Can anyone here refer me to someone who is active in making Hindi-language contributions to Wikisource? I wish to meet someone with experience in that language and project. Otherwise, can anyone suggest to me which Indic languages in Wikisource seem to be most active?
I dont know anyone personally, who contributes in Hindi Wikisource, but User:Sfic may be the person you are looking for. Recent contribution history shows his username, so he is active now. But as I said, I dont know him personally.
Is anyone able to make a recommendation for any OCR software for converting scanned Hindi language documents to digital text? Does anyone know anything about in-Wikisource support for OCR in Hindi language? Does it exist? Is there documentation?
Thanks for anything anyone can share.
Yes, I can recommend for this one.
For majority of Indic languages, including Hindi, Google OCR [1] is the only available option till now. We have tested and used it for Sanskrit Wikisource and it gives good result. As both the languages use the same Devanagari script, then it will work for Hindi too.
Obviously, the other best option is to train the Tessaract OCR [2] for Hindi, but it will take time. There is also a trained data [3] existing from Aug 2014. I dont know about its output result.
Also, ABBYY dont support Hindi [4]
[1] https://support.google.com/drive/answer/176692?hl=en [2] https://github.com/tesseract-ocr/tesseract [3] https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata [4] https://www.abbyy.com/support/finereader/12/rl/
I hope this helps,
Regards,
wikisource-l@lists.wikimedia.org