The problem that Ashwin Baindur raised was the improper digitisation effort. A rough Google search tells me that C-DAC is doing the digitisation for the Maharashtra Archives - http://www.cdac.in/html/egov/mda.aspx - which as Ashwin raised the point is stored on compact disks. Interestingly they are using SQL and Visual Basic under Windows NT. I am not sure if this is a good thing. I also do not know when this project was done either. So, not sure if those were then current technologies.

We discussed yesterday that Maharashtra Archives being a public institution (or for that matter any public institution) should ideally make these documents either public domain or release under an open copyright (do correct me if I am wrong with terminology).

Pradeep

On 14 February 2011 11:29, Pradeep Mohandas <pradeep.mohandas@gmail.com> wrote:

hi,

At the discussion yesterday, we were told that the OCR did not work at all in case of many Indian languages. Also, as a person who does not understand OCR at all, can any one help me with what they mean by a 80% successful OCR?

The other end of the process is the digitisation machine needed to convert the physical text into image. Any ideas on availability and cost of a museum grade digitisation machine? I am sure you cannot and the archives will not let you use an ordinary device to handle these documents.

thanks in advance,
Pradeep