On 6/16/06, SJ <2.718281828@gmail.com> wrote:

Museums are good repositories of such information; also non-digitized
archives. For them digitization is an expense; if we can reliably
offer this for free, many will be glad to release copyright in
exchange for more usable access to their own materials.

The Library of Congress has a sizable collection of materials that
they want to distribute more broadly; it is indeed already PD or
equivalent, but not digitized -- or more commonly, digitized somehow
but not in many formats, not classified, not easily available.

A commons-project to create form requests and a queue for processing
inbound content would be useful.

You could say the same about archived books that have no commercial
value anymore. The same analysis goes for processing book materials
donated to wikisource; which requires image processing and OCR and
should perhaps have a commons aspect (raw page images, raw ocr output
files, images from within the book extracted from the raw page
images), and a wikisource text aspect (text transcript, translations).
And again ties to the book industry would be useful here.

Finally, source texts that are educationally useful could generate a
third set of materials : living wikibooks built on their foundation,
updated and improved over time.

SJ

This kind of sounds like a Google Books sort of deal (well, the portion of GB which is actually public domain books). People scan in books, we take the scans and present them for free to the world. Am I right in the assessment? I didn't quite understand what was being stated.

Anyhow, I think such a proposal would be very exciting, especially if we took the scans and had a decent OCR program to convert it to text, proof it, and present in on Wikisource. And of course, taking anything from the LoC would practically double (extremely conservative estimate--not sure how much they'd be willing to give) our current database.