On Mon, Feb 14, 2011 at 10:18 AM, sankarshan foss.mailinglists@gmail.com wrote:
2011/2/14 shirish शिरीष shirishag75@gmail.com:
In Pune, around this time lot of colleges have their technical weeks where they show projects, last year and couple of years before I had seen students who had made nice OCR's which could work with indic languages but obviously required lot of polish and getting into the whole 'code maintainance' thing. The students motivation for that had been to do as a project and not getting things 'maintained' which is unglamorous grunt work. Also documentation is something that would need to be looked at and fine-tuned.
Indic OCR, at least the bits that are available under an appropriate FOSS license, have an accuracy of around 80%. Considering the volume and fragility of what you will OCR, that's remarkably low.
please send links to such technology. it does not matter if the accuracy is only 80%. Which means people have a role to play there. I see this as a clear opportunity asking for volunteer time. create a site with an image and the partially correct page side by side, and ask the volunteers to correct it. we can conduct workshops in colleges to seek help of this kind. Meanwhile, when people recognize where and what kinds of places the OCR sucks, we can think of solving those problems. This kind of work itself will help improve the existing OCR for indic-languages.
-- GN
-- GN
wikimediaindia-l@lists.wikimedia.org