Re: [Wikimediaindia-l] [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary - WikimediaIndia-l

14 Feb 2011


      On Mon, Feb 14, 2011 at 10:18 AM, sankarshan
foss.mailinglists@gmail.com wrote:
...
2011/2/14 shirish शिरीष shirishag75@gmail.com:
...
In Pune, around this time lot of colleges have their technical weeks
where they show projects, last year and couple of years before I had
seen students who had made nice OCR's which could work with indic
languages but obviously required lot of polish and getting into the
whole 'code maintainance' thing. The students motivation for that had
been to do as a project and not getting things 'maintained' which is
unglamorous grunt work. Also documentation is something that would
need to be looked at and fine-tuned.
Indic OCR, at least the bits that are available under an appropriate
FOSS license, have an accuracy of around 80%. Considering the volume
and fragility of what you will OCR, that's remarkably low.
please send links to such technology.  it does not matter if the
accuracy is only 80%.  Which means people have a role to play there.
I see this as a clear opportunity asking for volunteer time.  create a
site with an image and the partially correct page side by side, and
ask the volunteers to correct it.
we can conduct workshops in colleges to seek help of this kind.
Meanwhile, when people recognize where and what kinds of places the
OCR sucks, we can think of solving those problems.  This kind of work
itself will help improve the existing OCR for indic-languages.
--
GN
--
GN