I got the 80% success from Sankarshan's posterous -
http://sankarshan.posterous.com/the-plan-to-create-a-digital-library-of-100…
<http://sankarshan.posterous.com/the-plan-to-create-a-digital-library-of-100-c>The
problem that Ashwin Baindur raised was the improper digitisation effort. A
rough Google search tells me that C-DAC is doing the digitisation for the
Maharashtra Archives -
http://www.cdac.in/html/egov/mda.aspx - which as
Ashwin raised the point is stored on compact disks. Interestingly they are
using SQL and Visual Basic under Windows NT. I am not sure if this is a good
thing. I also do not know when this project was done either. So, not sure if
those were then current technologies.
We discussed yesterday that Maharashtra Archives being a public institution
(or for that matter any public institution) should ideally make these
documents either public domain or release under an open copyright (do
correct me if I am wrong with terminology).
warm regards,
Pradeep
On 14 February 2011 11:29, Pradeep Mohandas <pradeep.mohandas(a)gmail.com>wrote;wrote:
hi,
At the discussion yesterday, we were told that the OCR did not work at all
in case of many Indian languages. Also, as a person who does not understand
OCR at all, can any one help me with what they mean by a 80% successful
OCR?
The other end of the process is the digitisation machine needed to convert
the physical text into image. Any ideas on availability and cost of a museum
grade digitisation machine? I am sure you cannot and the archives will not
let you use an ordinary device to handle these documents.
thanks in advance,
Pradeep