On 9/7/06, Ray Saintonge saintonge@telus.net wrote:
Lars Aronsson wrote:
Ray Saintonge wrote:
I don't see where copyright is an issue with this. The Library of Congress is an arm of the United States Congress whose primary purpose is to serve U. S. legislators. That would put its work in the public domain. Is there any reason to believe otherwise?
Why don't I see any downloadable dump of their entire database? Providing that would be a great goal for the Wikimedia Foundation.
I think that the answer may be quite innocent. Until Wikimedia came along who would have wanted the entire database? If the demand didn't exist, they would have no reason to make it available.
Here we're freeing the encyclopedia, news reporting, pictures, and why not the library catalog. Just think about being able to importing it to MySQL or PostgreSQL on your own computer, and then do things like "select count(*)" to find which people translated most works from Croatian to Hungarian, and make a [[List of translators from Croatian to Hungarian]], so we can make sure we have encyclopedia articles for the 50 most active ones.
Again before such a task was undertaken someone had to imagine that it could be done. As long as the list had to be created manually, the task was for all practical purposes impossible. There are surely many other databases that need freeing, and they could be just as free if someone else were doing the freeing. If that other databse allows you at no cost to search in such a way that you can find the information you want is it not effectively free?
Today I can download the LoC catalog one MARC record at a time through a Z39.50 interface. So far, I'm not aware of anyone who copied the entire catalog this way and provided it for free download. If we had a copy, would the Wikimedia Foundation provide it for download? What does the legal councel or foundation board say? Do we need a written permission as a legal security, or can we simply trust that these U.S. government data are in the public domain? Are they in fact U.S. government data, or were they licensed from other sources, and under which terms?
While it's a good thing to investigate these questions more thoroughly, it would be pointless if proposal were technically impossible. I have been looking through http://www.loc.gov/z3950/agency/ where LC is indicated as the maintenance agencey for Z39.50/ISO 23950. Nowhere have I yet found any mention of copyright for the standard on the site.
This may cover the standard and formats, but what about the content of any particular entry? I would venture to say that it is not copyrightable. Copyright applies to the expression of information, and not the information itself. If the form of expression is predictable, as in conforming to a public domain standard the result would not be copyrightable.
One of the greatest threats to open access is the belief that something is protected by copyright when it isn't. Any fair use claim presumes that the material used is copyright protected in the first place. If the underlying material is not protected a fair use claim is redundant.
Things that I have looked at while trying to answer this http://www.earlham.edu/~peters/fos/newsletter/03-02-06.htm#collateral http://www.loc.gov/standards/relreport.pdf http://www.dlib.org/dlib/march00/coyle/03coyle.html
Other libraries may have different views concerning their material, but how much of their material is not in the LoC catalogue.
While the LoC catalog is huge in the number of records, and providing it for free download would be a great achievement, the assumption that it could replace every other library catalog is naive. For the example above, the LoC rarely catalogs which people translated between which languages. That information (for Croatian-Hungarian) is probably only in the catalog of Hungary's national library. For Hofstadter's famous "Gödel, Escher, Bach" LoC only finds three hits for three English editions, but none of this book's many translations to other languages. The German national bibliography shows 2 English editions, a dozen German printings, and 1 each in Dutch, Danish, and Spanish. The Dutch Royal Library lists two English and five Dutch printings, but the last one is documented as being the 9th printing, so the catalog in fact only covers half of what's been published. Many Dutch Wikipedians are likely to own copies of the other printings, and could provide the missing information if the database was Wikicat. And these are only languages that are close to English and well represented at the Library of Congress.
The Hofstadter example is a good one in that it warns us of the dangers of simplistic reduction. Many of our online colleagues seem to be motivated by some desire to make tasks easier. This is often done by ignoring embarassing complexities.
This takes us back to explaining the basics of library & information science. We should have a mailing list specialized on Wikicat and how to free the bibliography.
Perhaps, although I'm not sure we're ready for yet another mailing list. Full scale freeing of bibliographies can easily lead us into what amounts to a Union Catalog of private holdings.
Ec
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
I would guess that both the standard and the actual LOC data are covered by the same rule under which US government publications aren't copyrighted or copyrightable, and are free for anyone to use.
You're probably right that nobody has probably asked them for the database dump before. They may not be in a position to conveniently or reasonably give it to someone who asks. But it can't hurt to ask. Has anyone found contacts in the LOC organization and followed up on it?
Either Wikipedia or the Internet Archive would be good host locations for the data; possibly both.