Re: [Wikimediaindia-l] [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary - WikimediaIndia-l

14 Feb 2011

On Mon, Feb 14, 2011 at 10:18 AM, sankarshan
&lt;foss.mailinglists(a)gmail.com&gt; wrote:
...
  2011/2/14 shirish शिरीष
&lt;shirishag75(a)gmail.com&gt;om>:

  In Pune, around this time lot of colleges have
their technical weeks
 where they show projects, last year and couple of years before I had
 seen students who had made nice OCR's which could work with indic
 languages but obviously required lot of polish and getting into the
 whole 'code maintainance' thing. The students motivation for that had
 been to do as a project and not getting things 'maintained' which is
 unglamorous grunt work. Also documentation is something that would
 need to be looked at and fine-tuned. 
 Indic OCR, at least the bits that are available under an appropriate
 FOSS license, have an accuracy of around 80%. Considering the volume
 and fragility of what you will OCR, that's remarkably low.

please send links to such technology.  it does not matter if the
accuracy is only 80%.  Which means people have a role to play there.
I see this as a clear opportunity asking for volunteer time.  create a
site with an image and the partially correct page side by side, and
ask the volunteers to correct it.
we can conduct workshops in colleges to seek help of this kind.
Meanwhile, when people recognize where and what kinds of places the
OCR sucks, we can think of solving those problems.  This kind of work
itself will help improve the existing OCR for indic-languages.

--
GN

--
GN