Ravishankar asked me to comment on this and we had a short discussion
offline and here are some notes that might be useful. I had a look at the
general scheme and it seems to be more like a kind of nationalized
publishing system - where (presumably limited) rights to print are bought
from the legal heirs of the authors - so it is certainly not a case of the
content being released into the public domain. Given that it is mostly
fiction and poetry (a few non-fiction works, Ravishankar tells me) it seems
unlikely that it would be used extensively beyond citations and extracts
for articles on the authors and the works themselves.
What can be done with this non-free content to help Wikipedia and the
Internet better? Ravi pointed out that these works are not going to be
digitally accessible - ie - not searchable etc. My suggestion in this
regard is something that I do for scanned-non-OCR-ed English content
available for instance on the Digital Library of India (essentially
accessibly only to "hackers"). What I do here is to download all the TIF
page scans from the digital library of India, pack them into a PDF file and
then upload it into the Internet Archive. The Internet Archive does not
worry too much about copyrights when it concerns out-of-print (not
currently in print; not currently yielding monies) and orphan works placed
essentially for archival, they are registered as a library in the US and
have argued their case as a being exactly like that of any other library.
When a PDF file is uploaded to the Internet Archive, it creates several
other derivative files, an OCR-ed text, a PDF+OCR layer, Kindle, epub, djvu
etc. The result of this is that you can run a query on Google with
and you can find content.
Now this (the Internet Archive uses ABBYY) is not going to work with
Indian languages but those who are savvy can help in creating PDF files
with the scan layer as well as a proofed/OCR-ed Unicode layer that can be
uploaded into the system after processing offline.
Since the Internet Archive supports linking specific pages via their online
reader, it means that the citation templates can use the url for a specific
page and save any readers from having to download an entire book and let
them see the relevant source in context.
Many editors on en.wiki routinely use this method and I regularly upload
content from sites that are less convenient (DLI included) into the
Internet Archive. (In fact I have just noticed that I have more than 800
uploads - https://archive.org/details/@shyamal
) - let me know if anyone
needs assistance with this.
If creating the Unicode text layer is a problem because the content is not
in public domain, it might be useful to consider setting up an alternate
system along the lines of Distributed Proofreaders - http://www.pgdp.net/c/
code presumably available here
On Tue, Mar 31, 2015 at 11:38 AM, Ravishankar Ayyakkannu <
I am aware about the expectations from Commons Community.
That is why I am seeking legal interpretation for the current situation.
I will appreciate if anyone can connect us with Copyright experts in
Even if we are going to establish contact with the government and get
something in writing, this prior counsel will help to get it done in the
Thanks for offering help.
Will get in touch off-list.
Wikimediaindia-l mailing list
To unsubscribe from the list / change mailing preferences visit