[Wikimediaindia-l] [Wikimedia-in-mum] Liam Wyatt's visit to Mumbai and GLAM meetup - a summary

Nagarjuna G nagarjun at gnowgi.org
Mon Feb 14 04:59:15 UTC 2011


On Mon, Feb 14, 2011 at 10:18 AM, sankarshan
<foss.mailinglists at gmail.com> wrote:
> 2011/2/14 shirish शिरीष <shirishag75 at gmail.com>:
>
>> In Pune, around this time lot of colleges have their technical weeks
>> where they show projects, last year and couple of years before I had
>> seen students who had made nice OCR's which could work with indic
>> languages but obviously required lot of polish and getting into the
>> whole 'code maintainance' thing. The students motivation for that had
>> been to do as a project and not getting things 'maintained' which is
>> unglamorous grunt work. Also documentation is something that would
>> need to be looked at and fine-tuned.
>
> Indic OCR, at least the bits that are available under an appropriate
> FOSS license, have an accuracy of around 80%. Considering the volume
> and fragility of what you will OCR, that's remarkably low.
>

please send links to such technology.  it does not matter if the
accuracy is only 80%.  Which means people have a role to play there.
I see this as a clear opportunity asking for volunteer time.  create a
site with an image and the partially correct page side by side, and
ask the volunteers to correct it.
we can conduct workshops in colleges to seek help of this kind.
Meanwhile, when people recognize where and what kinds of places the
OCR sucks, we can think of solving those problems.  This kind of work
itself will help improve the existing OCR for indic-languages.

--
GN


--
GN



More information about the Wikimediaindia-l mailing list