Ravishankar asked me to comment on this and we had a short discussion offline and here are some notes that might be useful. I had a look at the general scheme and it seems to be more like a kind of nationalized publishing system - where (presumably limited) rights to print are bought from the legal heirs of the authors - so it is certainly not a case of the content being released into the public domain. Given that it is mostly fiction and poetry (a few non-fiction works, Ravishankar tells me) it seems unlikely that it would be used extensively beyond citations and extracts for articles on the authors and the works themselves.

What can be done with this non-free content to help Wikipedia and the Internet better? Ravi pointed out that these works are not going to be digitally accessible - ie - not searchable etc. My suggestion in this regard is something that I do for scanned-non-OCR-ed English content available for instance on the Digital Library of India (essentially accessibly only to "hackers"). What I do here is to download all the TIF page scans from the digital library of India, pack them into a PDF file and then upload it into the Internet Archive. The Internet Archive does not worry too much about copyrights when it concerns out-of-print (not currently in print; not currently yielding monies) and orphan works placed essentially for archival, they are registered as a library in the US and have argued their case as a being exactly like that of any other library. When a PDF file is uploaded to the Internet Archive, it creates several other derivative files, an OCR-ed text, a PDF+OCR layer, Kindle, epub, djvu etc. The result of this is that you can run a query on Google with "keyword" site:archive.org and you can find content.

Now this (the Internet Archive uses ABBYY)  is not going to work with Indian languages but those who are savvy can help in creating PDF files with the scan layer as well as a proofed/OCR-ed Unicode layer that can be uploaded into the system after processing offline.

Since the Internet Archive supports linking specific pages via their online reader, it means that the citation templates can use the url for a specific page and save any readers from having to download an entire book and let them see the relevant source in context.

Many editors on en.wiki routinely use this method and I regularly upload content from sites that are less convenient (DLI included) into the Internet Archive. (In fact I have just noticed that I have more than 800 uploads - https://archive.org/details/@shyamal )  - let me know if anyone needs assistance with this.

If creating the Unicode text layer is a problem because the content is not in public domain, it might be useful to consider setting up an alternate system along the lines of Distributed Proofreaders - http://www.pgdp.net/c/  code presumably available here http://sourceforge.net/projects/dproofreaders/

I am aware about the expectations from Commons Community.

That is why I am seeking legal interpretation for the current situation.

I will appreciate if anyone can connect us with Copyright experts in India.

Even if we are going to establish contact with the government and get something in writing, this prior counsel will help to get it done in the proper way.


Thanks for offering help.

Will get in touch off-list.


