On 6/16/06, SJ <2.718281828(a)gmail.com> wrote:
Museums are good repositories of such information; also non-digitized
archives. For them digitization is an expense; if we can reliably
offer this for free, many will be glad to release copyright in
exchange for more usable access to their own materials.
The Library of Congress has a sizable collection of materials that
they want to distribute more broadly; it is indeed already PD or
equivalent, but not digitized -- or more commonly, digitized somehow
but not in many formats, not classified, not easily available.
A commons-project to create form requests and a queue for processing
inbound content would be useful.
You could say the same about archived books that have no commercial
value anymore. The same analysis goes for processing book materials
donated to wikisource; which requires image processing and OCR and
should perhaps have a commons aspect (raw page images, raw ocr output
files, images from within the book extracted from the raw page
images), and a wikisource text aspect (text transcript, translations).
And again ties to the book industry would be useful here.
This kind of sounds like a Google Books sort of deal (well, the
portion of
GB which is actually public domain books). People scan in books, we take
the scans and present them for free to the world. Am I right in the
assessment? I didn't quite understand what was being stated.
Anyhow, I think such a proposal would be very exciting, especially if we
took the scans and had a decent OCR program to convert it to text,
proof it,
and present in on Wikisource. And of course, taking anything from the
LoC
would practically double (extremely conservative estimate--not sure
how much
they'd be willing to give) our current database.
There is no shortage of material that could or should be included A
very large proportion of the Google Books material id still not
available.even after the most conservative application of copyright
law. US Government publications dating before 1923 are still only
available in snippets. It could very well be a part of their agenda to
make these available only for a fee payable to them. Copyright
notwithstanding, being a unique source of useful material can be a
lucrative venture for Google. Big as the combined Wikimedia projects
may already be we are still far from being able to provide adequate
competition to Google Books.
Taking "Scientific American" alone as an example, 16 pages a week for 77
years (1845-1922) yields over 64,000 pages, and these are generally
large 11" by 16" pages. Even the most conservative estimates of the
amount of freely available material is staggering. To do it justice may
require a co-operative effort of all organizations interested in making
this work freely available.
Ec