Ryan Dabler wrote:
On 6/16/06, SJ 2.718281828@gmail.com wrote:
Museums are good repositories of such information; also non-digitized archives. For them digitization is an expense; if we can reliably offer this for free, many will be glad to release copyright in exchange for more usable access to their own materials.
The Library of Congress has a sizable collection of materials that they want to distribute more broadly; it is indeed already PD or equivalent, but not digitized -- or more commonly, digitized somehow but not in many formats, not classified, not easily available.
A commons-project to create form requests and a queue for processing inbound content would be useful.
You could say the same about archived books that have no commercial value anymore. The same analysis goes for processing book materials donated to wikisource; which requires image processing and OCR and should perhaps have a commons aspect (raw page images, raw ocr output files, images from within the book extracted from the raw page images), and a wikisource text aspect (text transcript, translations). And again ties to the book industry would be useful here.
This kind of sounds like a Google Books sort of deal (well, the portion of GB which is actually public domain books). People scan in books, we take the scans and present them for free to the world. Am I right in the assessment? I didn't quite understand what was being stated.
Anyhow, I think such a proposal would be very exciting, especially if we took the scans and had a decent OCR program to convert it to text, proof it, and present in on Wikisource. And of course, taking anything from the LoC would practically double (extremely conservative estimate--not sure how much they'd be willing to give) our current database.
There is no shortage of material that could or should be included A very large proportion of the Google Books material id still not available.even after the most conservative application of copyright law. US Government publications dating before 1923 are still only available in snippets. It could very well be a part of their agenda to make these available only for a fee payable to them. Copyright notwithstanding, being a unique source of useful material can be a lucrative venture for Google. Big as the combined Wikimedia projects may already be we are still far from being able to provide adequate competition to Google Books.
Taking "Scientific American" alone as an example, 16 pages a week for 77 years (1845-1922) yields over 64,000 pages, and these are generally large 11" by 16" pages. Even the most conservative estimates of the amount of freely available material is staggering. To do it justice may require a co-operative effort of all organizations interested in making this work freely available.
Ec