Re: [Wikimediaindia-l] Indic print material digitization workshop query

21 Aug 2013


      @Sumana Harihareswara
Please look the Bengali OCR https://code.google.com/p/banglaocr/  and its
need to developed.
On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara <
sumanah@wikimedia.org> wrote:
...
On 08/19/2013 02:52 AM, L. Shyamal wrote:
...
Re-posting a now outdated query from meta
http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalo...
...
now that the workshop has already been conducted I think those that have
attended the workshop could comment if this cover Indic language OCR-ing


...
if it did it would be worthwhile if the OCR software used can be
documented
...
on the meta pages or elsewhere such as Wikisource. Most of the more
experienced editors here will be fairly familiar with the use of scanners
for creating PDF documents and uploading them to places like the Internet
Archive but the experience or knowledge of OCRs and their success rates
is
...
a bit wanting for Indic languages (fonts).
best wishes
Shyamal
en:User:Shyamal
I looked at the talk page on Meta - thank you, Shyamal!
For those who do not know: OCR means Optical Character Recognition.
When we want to get archival documents onto the web, it's nice to have
photos of them, but it's even better to OCR them so that people can
clearly read, copy, excerpt, translate, and remix the text.
Is there a central list of the problems that OCR software (especially
open source OCR software) has with text written in Indic languages?  If
so, I could help encourage people to fix those problems, as volunteers,
via a Google Summer of Code/Outreach Program for Women internship, via a
grant-funded project (such as https://meta.wikimedia.org/wiki/Grants:IEG
), or via some other method.
People who would like to make Wikisource more easily useful for Indic
languages might want to contribute to the Wikisource vision development
project that's going on right now:
https://wikisource.org/wiki/Wikisource_vision_development
The ProofreadPage extension (part of the Wikisource technology stack) is
being worked on right now in Aarti K. Dwivedi's Google Summer of Code
internship.  http://aartindi.blogspot.in/  She might be interested in
knowing about these issues, so I am cc'ing her.
Also - just because people on this list might be interested! - if you
have an old historical map that you'd like to vectorize to get it onto
OpenStreetMap, try out the new "Map polygon and feature extractor" tool:
https://github.com/NYPL/map-vectorizer
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
To unsubscribe from the list / change mailing preferences visit
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wikimediaindia-l] Indic print material digitization workshop query