Central bibliography

List overview All Threads
Download

newer

older

Analyzing and Visualizing the...

Wikimania 2007 - bidding process -...

Bogdan Giusca

3 Sep 2006 3 Sep '06

8:09 a.m.

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

Currently, a book reference is done this way:

<ref>[[Istvan Vasary]] (2005) ''Cumans and Tatars'', [[Cambridge University Press]]. ISBN 123546958695, page 22</ref>

with a central bibliography, it would be like this;

and the book database would fill in the details in the page displayed to the viewer.

I suggest that we use the name instead of the ISBN, because it can be seen more clearly in the text, when reading the wiki-text. Also, there are many books (especially older books, but not only), which have no ISBN number and some which have many different editions with different ISBNs.

There is still the problem with two books with the same title, in which case we need to add the author, too, for disambiguation, but I think this problem is less on the kind of books we use for reference. Shorter titles are used especially for fiction.

There could be some benefits for having a central bibliography, other than not having to copy-paste the publishing house/ISBN, like knowing which articles refer to a certain book.

Show replies by date

Berto

3 Sep 3 Sep

8:19 a.m.

Hi!

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

Can't do that unless you use some form of wikidata. It makes little sense to reinvent filesystems based on templates when dbs are used for that stuff some 20 years already. The smarter solution would be to use ISBN code in the actual wiki version, hacking the "find books" function to reach an external repository based on wikidata. At that point you could also trace the quotations coming in from the main wiki and build cross-reference tables, relevance of authors per subject... that is, you'd have a real bibliography.

But it makes sense to keep this thing external, as most users are not going to need it and it would add overtime during execution.

Bèrto

Erik Moeller

8:29 a.m.

On 9/3/06, Bogdan Giusca liste@dapyx.com wrote:

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

There's a well thought-out proposal for this already: http://meta.wikimedia.org/wiki/Wikicat

There's also a related discussion: http://mail.wikipedia.org/pipermail/foundation-l/2006-July/008724.html (and responses)

The most realistic way this project may happen in the near future is through the related WikiAuthors project that has been promoted by Miguel Andrade and others: http://meta.wikimedia.org/wiki/WikiAuthors

I discussed WikiAuthors with a group of scientists at a workshop in Bloomington a few days ago: http://vw.indiana.edu/places&spaces/meeting_060830.php

There was very broad support for the idea of a central wiki for the purpose of author disambiguation, as many existing databases do not assign unique IDs per author. It is likely that funding for this will come through in the coming months, which may also facilitate some of the broader ideas described in the Wikicat proposal linked to above.

The general technology for building wiki-based ontology repositories is part of the WiktionaryZ project and under heavy development: http://wiktionaryz.org/

Let me know if you have further questions, or want to be actively involved in this project.

-- Peace & Love, Erik

Gerard Meijssen

8:31 a.m.

Hoi, There are several projects that want to achieve similar aims. Wikiauthors, Wikicat are two of them. The initial approach of these two is different; one starts of with scientific publications and wants to start with disambiguation of authors the other wants to start with bibliographic information from the position of librarians. The cool thing is, that the people representing these two approaches are talking and, that it is feasible to merge many if not all of the ideas for both projects. Both projects have people behind them to make this work, combined the result will provide us with a central bibliography that is based on Wikidata functionality.

There are two things to understand; first the database should be usable for all projects and it needs to be usable in many languages and second after having set up such a project it still needs integration in the Wikipedia and other projects.

Thanks, GerardM

Bogdan Giusca wrote:

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

Currently, a book reference is done this way:

<ref>[[Istvan Vasary]] (2005) ''Cumans and Tatars'', [[Cambridge University Press]]. ISBN 123546958695, page 22</ref>

with a central bibliography, it would be like this;

<book n="Cumans and Tatars" p="22"/>

and the book database would fill in the details in the page displayed to the viewer.

I suggest that we use the name instead of the ISBN, because it can be seen more clearly in the text, when reading the wiki-text. Also, there are many books (especially older books, but not only), which have no ISBN number and some which have many different editions with different ISBNs.

There is still the problem with two books with the same title, in which case we need to add the author, too, for disambiguation, but I think this problem is less on the kind of books we use for reference. Shorter titles are used especially for fiction.

There could be some benefits for having a central bibliography, other than not having to copy-paste the publishing house/ISBN, like knowing which articles refer to a certain book.

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

Ray Saintonge

10:08 a.m.

Gerard Meijssen wrote:

...

Hoi, There are several projects that want to achieve similar aims. Wikiauthors, Wikicat are two of them. The initial approach of these two is different; one starts of with scientific publications and wants to start with disambiguation of authors the other wants to start with bibliographic information from the position of librarians. The cool thing is, that the people representing these two approaches are talking and, that it is feasible to merge many if not all of the ideas for both projects. Both projects have people behind them to make this work, combined the result will provide us with a central bibliography that is based on Wikidata functionality.

There are two things to understand; first the database should be usable for all projects and it needs to be usable in many languages and second after having set up such a project it still needs integration in the Wikipedia and other projects.

Sure, but trial and error is also important. That's often how we find out what works and what doesn't.

Gerard Meijssen

2:50 p.m.

Ray Saintonge wrote:

...

Gerard Meijssen wrote:

...
Hoi, There are several projects that want to achieve similar aims. Wikiauthors, Wikicat are two of them. The initial approach of these two is different; one starts of with scientific publications and wants to start with disambiguation of authors the other wants to start with bibliographic information from the position of librarians. The cool thing is, that the people representing these two approaches are talking and, that it is feasible to merge many if not all of the ideas for both projects. Both projects have people behind them to make this work, combined the result will provide us with a central bibliography that is based on Wikidata functionality.

There are two things to understand; first the database should be usable for all projects and it needs to be usable in many languages and second after having set up such a project it still needs integration in the Wikipedia and other projects.

Sure, but trial and error is also important. That's often how we find out what works and what doesn't.

Ec

Hoi, Indeed, we tried we erred. Can we move on ? NB we will not stop learning .. Thanks, GerardM

maru dubshinki

8:51 a.m.

On 9/3/06, Bogdan Giusca liste@dapyx.com wrote:

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

................

If a book is being used for dozens of articles, doesn't that show that it's really important and so should have at least a stub for it? Using articles would seem to fulfill most of your desiderata.

~maru

Gerard Meijssen

8:50 a.m.

maru dubshinki wrote:

...

On 9/3/06, Bogdan Giusca liste@dapyx.com wrote:

...
I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

................

If a book is being used for dozens of articles, doesn't that show that it's really important and so should have at least a stub for it? Using articles would seem to fulfill most of your desiderata.

~maru

Hoi, When a book is used quite often, it may mean that someone is pushing a particular point of view.. When a book is relevant you want to annotate that book. But it does not follow that it needs a Wikipedia article of it's own. Thanks, GerardM

Ray Saintonge

10:59 a.m.

Gerard Meijssen wrote:

...

maru dubshinki wrote:

...
On 9/3/06, Bogdan Giusca liste@dapyx.com wrote:

...
I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

................

If a book is being used for dozens of articles, doesn't that show that it's really important and so should have at least a stub for it? Using articles would seem to fulfill most of your desiderata.

~maru

Hoi, When a book is used quite often, it may mean that someone is pushing a particular point of view.. When a book is relevant you want to annotate that book. But it does not follow that it needs a Wikipedia article of it's own.

Whether a bibliographic listing implies a whole Wikipedia article is another matter. I don't see that as the intent. The frequent use of a work could be that it reflects its status as a standard textbook for one of the sciences, and references to several such textbooks may show that an idea is widespread. From a Wiktionary perspective frequent quotes of Shakespeare's works are excellent evidence that the word in question was in use during his lifetime.

Francis Tyers

9:19 a.m.

In fact, this brings up an interesting point, having <book> and <article> for specifics (they tend to be long), and then having <ref> for miscellaneous stuff.

I completely agree that a centralised repository of references/ citations would be desirable, not only would we be able to see which books are used where, but possibly (if people think it wise) to see who has which books/articles.

For example, I have many journal articles that I have photocopied from the British Library (my department paid for it) that I would be happy to look up information in for people. They can't be distributed because of copyright reasons, but if someone comes along, sees that I've cited it and asks me a question about the paper, I would have no problem in looking it up etc.

Fran

On Sun, 2006-09-03 at 18:09 +0300, Bogdan Giusca wrote:

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

Currently, a book reference is done this way:

<ref>[[Istvan Vasary]] (2005) ''Cumans and Tatars'', [[Cambridge University Press]]. ISBN 123546958695, page 22</ref>

with a central bibliography, it would be like this;

<book n="Cumans and Tatars" p="22"/>

and the book database would fill in the details in the page displayed to the viewer.

I suggest that we use the name instead of the ISBN, because it can be seen more clearly in the text, when reading the wiki-text. Also, there are many books (especially older books, but not only), which have no ISBN number and some which have many different editions with different ISBNs.

There is still the problem with two books with the same title, in which case we need to add the author, too, for disambiguation, but I think this problem is less on the kind of books we use for reference. Shorter titles are used especially for fiction.

There could be some benefits for having a central bibliography, other than not having to copy-paste the publishing house/ISBN, like knowing which articles refer to a certain book.

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

Ray Saintonge

11:08 a.m.

Francis Tyers wrote:

...

I completely agree that a centralised repository of references/ citations would be desirable, not only would we be able to see which books are used where, but possibly (if people think it wise) to see who has which books/articles.

For example, I have many journal articles that I have photocopied from the British Library (my department paid for it) that I would be happy to look up information in for people. They can't be distributed because of copyright reasons, but if someone comes along, sees that I've cited it and asks me a question about the paper, I would have no problem in looking it up etc.

Taking that a step further: User A has made a statement, and claims that it is supported by a specific reference. User B questions the claim and its source. User C has registered his ownership of the source in the Central bibliography. User B can then ask User C to check the claim to see if it is really in that source, or to see if it has been misquoted or, more frequently, misunderstood.

Lars Aronsson

4 Sep 4 Sep

8:15 p.m.

Francis Tyers wrote:

...

In fact, this brings up an interesting point, having <book> and

<article> for specifics (they tend to be long), and then having <ref> for miscellaneous stuff.

When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Suppose you define a way to identify a pre-defined reference, e.g. the Wikipedia Standard Reference Numbering (WSRN), you could set up a server of your own where the reference database is kept and just link to that from a template. It would be no different from Egil Kvaleberg's map link server and the {{coor}} template.

WSRN = 1 --> "The Wiki Way", ISBN 0-201-71499-X WSRN = 2 --> "The Hive", article in "The Atlantic Monthly", http://www.theatlantic.com/doc/200609/wikipedia WSRN = 3 --> ...

and a template {{wsrn|3}} to make a reference.

If you implement the database and fill it with contents, I can write the template.

Here are two ideas for how to populate the database:

1. Copy the entire catalog of the Library of Congress and hope they don't mind your doing so. You can use the LCCN as the key, but this won't give you any reference to The Hive article, unless the LoC starts to catalog The Atlantic Monthly on an article level.

2. Download the latest dump and extract all the <ref>...</ref> tags currently used in the English Wikipedia. Begin with the works that are most frequently referenced.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Gerard Meijssen

5 Sep 5 Sep

1:13 a.m.

Lars Aronsson wrote:

...

Francis Tyers wrote:

...
In fact, this brings up an interesting point, having <book> and

<article> for specifics (they tend to be long), and then having <ref> for miscellaneous stuff.

When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Hoi, Based on what ? WiktionaryZ is developing nicely, it is based on Wikidata. Versioning is expected to make it's first apearance (Erik willing) this week. This hardly qualifies as nothing happening. As to Wikiauthors, there has been some deep thinking about what it needs to do by people who met IRL and the chances of something happening in this space is better than 89%.

...

Suppose you define a way to identify a pre-defined reference, .....

For me there is little reason to suppose anything... Wikidata rocks.

Thanks, GerardM

Akash Mehta

1:21 a.m.

I agree that WikiData is a good idea, but what will it take for the developers to implement this? Are wikitech-l subscribers subscribed to wikipedia-l as well?

On 9/5/06, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Lars Aronsson wrote:

...
Francis Tyers wrote:

...
In fact, this brings up an interesting point, having <book> and

<article> for specifics (they tend to be long), and then having <ref> for miscellaneous stuff.

When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Hoi, Based on what ? WiktionaryZ is developing nicely, it is based on Wikidata. Versioning is expected to make it's first apearance (Erik willing) this week. This hardly qualifies as nothing happening. As to Wikiauthors, there has been some deep thinking about what it needs to do by people who met IRL and the chances of something happening in this space is better than 89%.

...
Suppose you define a way to identify a pre-defined reference, .....

For me there is little reason to suppose anything... Wikidata rocks.

Thanks, GerardM _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

Gerard Meijssen

1:21 a.m.

Hoi, WikiData is already in the MediaWiki SVN. http://wiktionaryz.org is life, over 50.000 records have already been added by it's community.

Wikipedia-l is about all things wikipedia. There are loads of people who do not subscribe to it because it is not the only game in town.

Thanks, GerardM

Akash Mehta wrote:

...

I agree that WikiData is a good idea, but what will it take for the developers to implement this? Are wikitech-l subscribers subscribed to wikipedia-l as well?

On 9/5/06, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Lars Aronsson wrote:

...
Francis Tyers wrote:

...
In fact, this brings up an interesting point, having <book> and

<article> for specifics (they tend to be long), and then having <ref> for miscellaneous stuff.

When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Hoi, Based on what ? WiktionaryZ is developing nicely, it is based on Wikidata. Versioning is expected to make it's first apearance (Erik willing) this week. This hardly qualifies as nothing happening. As to Wikiauthors, there has been some deep thinking about what it needs to do by people who met IRL and the chances of something happening in this space is better than 89%.

...
Suppose you define a way to identify a pre-defined reference, .....

For me there is little reason to suppose anything... Wikidata rocks.

Thanks, GerardM

Lars Aronsson

11:23 a.m.

Gerard Meijssen wrote:

...

Lars Aronsson wrote:

...
When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Based on what ? WiktionaryZ is developing nicely,

Gerard, your promotion of Wikidata is as predictable as a Swiss watch. I admire your optimism. In this case, however, I was talking about the tendency to use its name as an excuse for not doing something else. In the case of Wikicat, I don't think it's the Wikidata technology component that is lacking.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

GerardM

12:16 p.m.

Hoi, The great thing is you can observe how WiktionaryZ is doing, that may prove that my optimism can indeed be compared to the reliable Swiss watch.

You deny the argument for Wikicat to need Wikidata. I wonder what you base this on. It is given what it aims to do perfectly reasonable. Thanks, GerardM

On 9/5/06, Lars Aronsson lars@aronsson.se wrote:

...

Gerard Meijssen wrote:

...
Lars Aronsson wrote:

...
When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Based on what ? WiktionaryZ is developing nicely,

Gerard, your promotion of Wikidata is as predictable as a Swiss watch. I admire your optimism. In this case, however, I was talking about the tendency to use its name as an excuse for not doing something else. In the case of Wikicat, I don't think it's the Wikidata technology component that is lacking.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

Lars Aronsson

6 Sep 6 Sep

3:09 a.m.

GerardM wrote:

...

You deny the argument for Wikicat to need Wikidata. I wonder what you base this on. It is given what it aims to do perfectly reasonable.

For WiktionaryZ you needed two things: Wikidata and some free contents to fill it, which you found in the GEMET vocabulary.

For Wikicat, perhaps it could be based on Wikidata or perhaps on some other technical solution, but it also needs contents, and this contents needs to be free (as in freedom). The dump of the database must be possible to download and reuse for other purposes. My point is that we currently don't have that kind of content. What's lacking is not so much the technical platform, as the contents. The most recent proposal ([[m:Wikicat]]) by Jleybov is based on lifting catalog records from the Library of Congress and other major libraries through a Z39.50 interface. I don't see any contract with the LoC and other libraries that confirms that these data can be reused for any purpose, and I don't see any efforts being made to getting such contracts. Instead, on the m:talk:Wikicat page, I see vague ideas about using data that are available for educational non-profit purposes, and I personally think that is quite insufficient. I don't think it is impossible to solve the licensing issue, but I think it needs to be solved. Just ignoring the legal issue is not good enough.

Whether Wikidata is ready for deployment or not, this is not what's stopping Wikicat at this moment.

Could you talk to kb.nl and ask if their entire catalog can be released into the public domain or under a useful license? That would be a move similar to releasing the GEMET vocabulary, and could provide the starting of Wikicat.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Ray Saintonge

1:43 p.m.

Lars Aronsson wrote:

...

For Wikicat, perhaps it could be based on Wikidata or perhaps on

some other technical solution, but it also needs contents, and this contents needs to be free (as in freedom). The dump of the database must be possible to download and reuse for other purposes. My point is that we currently don't have that kind of content. What's lacking is not so much the technical platform, as the contents. The most recent proposal ([[m:Wikicat]]) by Jleybov is based on lifting catalog records from the Library of Congress and other major libraries through a Z39.50 interface. I don't see any contract with the LoC and other libraries that confirms that these data can be reused for any purpose, and I don't see any efforts being made to getting such contracts. Instead, on the m:talk:Wikicat page, I see vague ideas about using data that are available for educational non-profit purposes, and I personally think that is quite insufficient. I don't think it is impossible to solve the licensing issue, but I think it needs to be solved. Just ignoring the legal issue is not good enough.

I don't see where copyright is an issue with this. The Library of Congress is an arm of the United States Congress whose primary purpose is to serve U. S. legislators. That would put its work in the public domain. Is there any reason to believe otherwise? Other libraries may have different views concerning their material, but how much of their material is not in the LoC catalogue.

Lars Aronsson

7 Sep 7 Sep

1:42 a.m.

Ray Saintonge wrote:

...

I don't see where copyright is an issue with this. The Library of Congress is an arm of the United States Congress whose primary purpose is to serve U. S. legislators. That would put its work in the public domain. Is there any reason to believe otherwise?

Why don't I see any downloadable dump of their entire database? Providing that would be a great goal for the Wikimedia Foundation. Here we're freeing the encyclopedia, news reporting, pictures, and why not the library catalog. Just think about being able to importing it to MySQL or PostgreSQL on your own computer, and then do things like "select count(*)" to find which people translated most works from Croatian to Hungarian, and make a [[List of translators from Croatian to Hungarian]], so we can make sure we have encyclopedia articles for the 50 most active ones.

Currently there is only one entry in http://en.wikipedia.org/wiki/Category:Hungarian_translators

Today I can download the LoC catalog one MARC record at a time through a Z39.50 interface. So far, I'm not aware of anyone who copied the entire catalog this way and provided it for free download. If we had a copy, would the Wikimedia Foundation provide it for download? What does the legal councel or foundation board say? Do we need a written permission as a legal security, or can we simply trust that these U.S. government data are in the public domain? Are they in fact U.S. government data, or were they licensed from other sources, and under which terms?

...

Other libraries may have different views concerning their material, but how much of their material is not in the LoC catalogue.

While the LoC catalog is huge in the number of records, and providing it for free download would be a great achievement, the assumption that it could replace every other library catalog is naive. For the example above, the LoC rarely catalogs which people translated between which languages. That information (for Croatian-Hungarian) is probably only in the catalog of Hungary's national library. For Hofstadter's famous "Gödel, Escher, Bach" LoC only finds three hits for three English editions, but none of this book's many translations to other languages. The German national bibliography shows 2 English editions, a dozen German printings, and 1 each in Dutch, Danish, and Spanish. The Dutch Royal Library lists two English and five Dutch printings, but the last one is documented as being the 9th printing, so the catalog in fact only covers half of what's been published. Many Dutch Wikipedians are likely to own copies of the other printings, and could provide the missing information if the database was Wikicat. And these are only languages that are close to English and well represented at the Library of Congress.

This takes us back to explaining the basics of library & information science. We should have a mailing list specialized on Wikicat and how to free the bibliography.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Gerard Meijssen

3:10 a.m.

Lars Aronsson wrote:

...

Ray Saintonge wrote:

...
I don't see where copyright is an issue with this. The Library of Congress is an arm of the United States Congress whose primary purpose is to serve U. S. legislators. That would put its work in the public domain. Is there any reason to believe otherwise?

Why don't I see any downloadable dump of their entire database? Providing that would be a great goal for the Wikimedia Foundation. Here we're freeing the encyclopedia, news reporting, pictures, and why not the library catalog. Just think about being able to importing it to MySQL or PostgreSQL on your own computer, and then do things like "select count(*)" to find which people translated most works from Croatian to Hungarian, and make a [[List of translators from Croatian to Hungarian]], so we can make sure we have encyclopedia articles for the 50 most active ones.

Currently there is only one entry in http://en.wikipedia.org/wiki/Category:Hungarian_translators

Today I can download the LoC catalog one MARC record at a time through a Z39.50 interface. So far, I'm not aware of anyone who copied the entire catalog this way and provided it for free download. If we had a copy, would the Wikimedia Foundation provide it for download? What does the legal councel or foundation board say? Do we need a written permission as a legal security, or can we simply trust that these U.S. government data are in the public domain? Are they in fact U.S. government data, or were they licensed from other sources, and under which terms?

There are benefits in asking nicely.. there are benefits in cooperation. Needing permission and asking for permission and cooperation makes the other organisation a party to what we want to achieve. When you select the translators from Croatian to Hungarian, you get a result where you want to disambiguate the translators. This is what Wikiauthors will bring you..

Indeed cooperation is a good thing..

...

...
Other libraries may have different views concerning their material, but how much of their material is not in the LoC catalogue.

While the LoC catalog is huge in the number of records, and providing it for free download would be a great achievement, the assumption that it could replace every other library catalog is naive. For the example above, the LoC rarely catalogs which people translated between which languages. That information (for Croatian-Hungarian) is probably only in the catalog of Hungary's national library. For Hofstadter's famous "Gödel, Escher, Bach" LoC only finds three hits for three English editions, but none of this book's many translations to other languages. The German national bibliography shows 2 English editions, a dozen German printings, and 1 each in Dutch, Danish, and Spanish. The Dutch Royal Library lists two English and five Dutch printings, but the last one is documented as being the 9th printing, so the catalog in fact only covers half of what's been published. Many Dutch Wikipedians are likely to own copies of the other printings, and could provide the missing information if the database was Wikicat. And these are only languages that are close to English and well represented at the Library of Congress.

This takes us back to explaining the basics of library & information science. We should have a mailing list specialized on Wikicat and how to free the bibliography.

Indeed cooperation is a sweet thing.. Thanks, GerardM

Ray Saintonge

1:20 p.m.

Lars Aronsson wrote:

...

Ray Saintonge wrote:

...
I don't see where copyright is an issue with this. The Library of Congress is an arm of the United States Congress whose primary purpose is to serve U. S. legislators. That would put its work in the public domain. Is there any reason to believe otherwise?

Why don't I see any downloadable dump of their entire database? Providing that would be a great goal for the Wikimedia Foundation.

I think that the answer may be quite innocent. Until Wikimedia came along who would have wanted the entire database? If the demand didn't exist, they would have no reason to make it available.

...

Here we're freeing the encyclopedia, news reporting, pictures, and why not the library catalog. Just think about being able to importing it to MySQL or PostgreSQL on your own computer, and then do things like "select count(*)" to find which people translated most works from Croatian to Hungarian, and make a [[List of translators from Croatian to Hungarian]], so we can make sure we have encyclopedia articles for the 50 most active ones.

Again before such a task was undertaken someone had to imagine that it could be done. As long as the list had to be created manually, the task was for all practical purposes impossible. There are surely many other databases that need freeing, and they could be just as free if someone else were doing the freeing. If that other databse allows you at no cost to search in such a way that you can find the information you want is it not effectively free?

...

Today I can download the LoC catalog one MARC record at a time through a Z39.50 interface. So far, I'm not aware of anyone who copied the entire catalog this way and provided it for free download. If we had a copy, would the Wikimedia Foundation provide it for download? What does the legal councel or foundation board say? Do we need a written permission as a legal security, or can we simply trust that these U.S. government data are in the public domain? Are they in fact U.S. government data, or were they licensed from other sources, and under which terms?

While it's a good thing to investigate these questions more thoroughly, it would be pointless if proposal were technically impossible. I have been looking through http://www.loc.gov/z3950/agency/ where LC is indicated as the maintenance agencey for Z39.50/ISO 23950. Nowhere have I yet found any mention of copyright for the standard on the site.

This may cover the standard and formats, but what about the content of any particular entry? I would venture to say that it is not copyrightable. Copyright applies to the expression of information, and not the information itself. If the form of expression is predictable, as in conforming to a public domain standard the result would not be copyrightable.

One of the greatest threats to open access is the belief that something is protected by copyright when it isn't. Any fair use claim presumes that the material used is copyright protected in the first place. If the underlying material is not protected a fair use claim is redundant.

Things that I have looked at while trying to answer this http://www.earlham.edu/~peters/fos/newsletter/03-02-06.htm#collateral http://www.loc.gov/standards/relreport.pdf http://www.dlib.org/dlib/march00/coyle/03coyle.html

...

...
Other libraries may have different views concerning their material, but how much of their material is not in the LoC catalogue.

While the LoC catalog is huge in the number of records, and providing it for free download would be a great achievement, the assumption that it could replace every other library catalog is naive. For the example above, the LoC rarely catalogs which people translated between which languages. That information (for Croatian-Hungarian) is probably only in the catalog of Hungary's national library. For Hofstadter's famous "Gödel, Escher, Bach" LoC only finds three hits for three English editions, but none of this book's many translations to other languages. The German national bibliography shows 2 English editions, a dozen German printings, and 1 each in Dutch, Danish, and Spanish. The Dutch Royal Library lists two English and five Dutch printings, but the last one is documented as being the 9th printing, so the catalog in fact only covers half of what's been published. Many Dutch Wikipedians are likely to own copies of the other printings, and could provide the missing information if the database was Wikicat. And these are only languages that are close to English and well represented at the Library of Congress.

The Hofstadter example is a good one in that it warns us of the dangers of simplistic reduction. Many of our online colleagues seem to be motivated by some desire to make tasks easier. This is often done by ignoring embarassing complexities.

...

This takes us back to explaining the basics of library & information science. We should have a mailing list specialized on Wikicat and how to free the bibliography.

Perhaps, although I'm not sure we're ready for yet another mailing list. Full scale freeing of bibliographies can easily lead us into what amounts to a Union Catalog of private holdings.

George Herbert

1:33 p.m.

On 9/7/06, Ray Saintonge saintonge@telus.net wrote:

...

Lars Aronsson wrote:

...
Ray Saintonge wrote:

...
I don't see where copyright is an issue with this. The Library of Congress is an arm of the United States Congress whose primary purpose is to serve U. S. legislators. That would put its work in the public domain. Is there any reason to believe otherwise?

Why don't I see any downloadable dump of their entire database? Providing that would be a great goal for the Wikimedia Foundation.

I think that the answer may be quite innocent. Until Wikimedia came along who would have wanted the entire database? If the demand didn't exist, they would have no reason to make it available.

...
Here we're freeing the encyclopedia, news reporting, pictures, and why not the library catalog. Just think about being able to importing it to MySQL or PostgreSQL on your own computer, and then do things like "select count(*)" to find which people translated most works from Croatian to Hungarian, and make a [[List of translators from Croatian to Hungarian]], so we can make sure we have encyclopedia articles for the 50 most active ones.

Again before such a task was undertaken someone had to imagine that it could be done. As long as the list had to be created manually, the task was for all practical purposes impossible. There are surely many other databases that need freeing, and they could be just as free if someone else were doing the freeing. If that other databse allows you at no cost to search in such a way that you can find the information you want is it not effectively free?

...
Today I can download the LoC catalog one MARC record at a time through a Z39.50 interface. So far, I'm not aware of anyone who copied the entire catalog this way and provided it for free download. If we had a copy, would the Wikimedia Foundation provide it for download? What does the legal councel or foundation board say? Do we need a written permission as a legal security, or can we simply trust that these U.S. government data are in the public domain? Are they in fact U.S. government data, or were they licensed from other sources, and under which terms?

While it's a good thing to investigate these questions more thoroughly, it would be pointless if proposal were technically impossible. I have been looking through http://www.loc.gov/z3950/agency/ where LC is indicated as the maintenance agencey for Z39.50/ISO 23950. Nowhere have I yet found any mention of copyright for the standard on the site.

This may cover the standard and formats, but what about the content of any particular entry? I would venture to say that it is not copyrightable. Copyright applies to the expression of information, and not the information itself. If the form of expression is predictable, as in conforming to a public domain standard the result would not be copyrightable.

One of the greatest threats to open access is the belief that something is protected by copyright when it isn't. Any fair use claim presumes that the material used is copyright protected in the first place. If the underlying material is not protected a fair use claim is redundant.

Things that I have looked at while trying to answer this http://www.earlham.edu/~peters/fos/newsletter/03-02-06.htm#collateral http://www.loc.gov/standards/relreport.pdf http://www.dlib.org/dlib/march00/coyle/03coyle.html

...
...
Other libraries may have different views concerning their material, but how much of their material is not in the LoC catalogue.

While the LoC catalog is huge in the number of records, and providing it for free download would be a great achievement, the assumption that it could replace every other library catalog is naive. For the example above, the LoC rarely catalogs which people translated between which languages. That information (for Croatian-Hungarian) is probably only in the catalog of Hungary's national library. For Hofstadter's famous "Gödel, Escher, Bach" LoC only finds three hits for three English editions, but none of this book's many translations to other languages. The German national bibliography shows 2 English editions, a dozen German printings, and 1 each in Dutch, Danish, and Spanish. The Dutch Royal Library lists two English and five Dutch printings, but the last one is documented as being the 9th printing, so the catalog in fact only covers half of what's been published. Many Dutch Wikipedians are likely to own copies of the other printings, and could provide the missing information if the database was Wikicat. And these are only languages that are close to English and well represented at the Library of Congress.

The Hofstadter example is a good one in that it warns us of the dangers of simplistic reduction. Many of our online colleagues seem to be motivated by some desire to make tasks easier. This is often done by ignoring embarassing complexities.

...
This takes us back to explaining the basics of library & information science. We should have a mailing list specialized on Wikicat and how to free the bibliography.

Perhaps, although I'm not sure we're ready for yet another mailing list. Full scale freeing of bibliographies can easily lead us into what amounts to a Union Catalog of private holdings.

Ec

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

I would guess that both the standard and the actual LOC data are covered by the same rule under which US government publications aren't copyrighted or copyrightable, and are free for anyone to use.

You're probably right that nobody has probably asked them for the database dump before. They may not be in a position to conveniently or reasonably give it to someone who asks. But it can't hurt to ask. Has anyone found contacts in the LOC organization and followed up on it?

Either Wikipedia or the Internet Archive would be good host locations for the data; possibly both.

-- -george william herbert george.herbert@gmail.com

Sabine Cretella

6 Sep 6 Sep

11:15 a.m.

Lars Aronsson schrieb:

...

Gerard Meijssen wrote:

...
Lars Aronsson wrote:

...
When anybody says we "have to" introduce completely new markup or we "have to" wait for Wikidata to be implemented, then I know it's just an excuse for doing nothing.

Based on what ? WiktionaryZ is developing nicely,

Gerard, your promotion of Wikidata is as predictable as a Swiss watch. I admire your optimism. In this case, however, I was talking about the tendency to use its name as an excuse for not doing something else. In the case of Wikicat, I don't think it's the Wikidata technology component that is lacking.

Well, so is mine (predictable) ... What about reusability of the data? Did you consider that bit? Btw. waiting for wikidata is not an excuse for not working on stuff - you can well do it structured in such a way that it is then easy to import.

Salutammo, Sabine

Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com

Bogdan Giusca

5:23 a.m.

Tuesday, September 5, 2006, 6:15:46 AM, you wrote:

LA> Suppose you define a way to identify a pre-defined reference, e.g. LA> the Wikipedia Standard Reference Numbering (WSRN), you could set LA> up a server of your own where the reference database is kept and LA> just link to that from a template. It would be no different from LA> Egil Kvaleberg's map link server and the {{coor}} template.

LA> WSRN = 1 --> "The Wiki Way", ISBN 0-201-71499-X LA> WSRN = 2 --> "The Hive", article in "The Atlantic Monthly", LA> http://www.theatlantic.com/doc/200609/wikipedia LA> WSRN = 3 --> ...

LA> and a template {{wsrn|3}} to make a reference.

but, when one would edit the article, they will see just {{wsrn|3}} and have absolutely no idea about which book is referenced. Especially in articles with dozens of book references, this would make the editing much more complicated than it is now with <ref> tags.

Bryan Derksen

3 Sep 3 Sep

10:27 a.m.

Bogdan Giusca wrote:

...

Currently, a book reference is done this way:

<ref>[[Istvan Vasary]] (2005) ''Cumans and Tatars'', [[Cambridge University Press]]. ISBN 123546958695, page 22</ref>

with a central bibliography, it would be like this;

<book n="Cumans and Tatars" p="22"/>

You could already do this with templates, in the manner along the lines of what I suggested yesterday;

<ref>{{bibliography/Cumans and Tatars}}, p. 22</ref>

Draicone picked up on my idea and created http://en.wikipedia.org/wiki/User:Draicone/WikiProject_Reference_Help to discuss it further.

Ray Saintonge

2:14 p.m.

Bogdan Giusca wrote:

...

I suggest that we use the name instead of the ISBN, because it can be seen more clearly in the text, when reading the wiki-text. Also, there are many books (especially older books, but not only), which have no ISBN number and some which have many different editions with different ISBNs.

There is still the problem with two books with the same title, in which case we need to add the author, too, for disambiguation, but I think this problem is less on the kind of books we use for reference. Shorter titles are used especially for fiction.

I have never been keen on ISBNs either. In addition to older works that never had an ISBN in the first place, works with a long history of different editions can have any number of ISBNs, and at some point it would be helpful to be able to compare various editions.

David Gerard

2:57 p.m.

On 03/09/06, Ray Saintonge saintonge@telus.net wrote:

...

I have never been keen on ISBNs either. In addition to older works that never had an ISBN in the first place, works with a long history of different editions can have any number of ISBNs, and at some point it would be helpful to be able to compare various editions.

Yes. ISBNs are useful indeed, but they were created for the use of the publishing industry and are per *edition* rather than per *book*.

- d.

Gerard Meijssen

3:14 p.m.

David Gerard wrote:

...

On 03/09/06, Ray Saintonge saintonge@telus.net wrote:

...
I have never been keen on ISBNs either. In addition to older works that never had an ISBN in the first place, works with a long history of different editions can have any number of ISBNs, and at some point it would be helpful to be able to compare various editions.

Yes. ISBNs are useful indeed, but they were created for the use of the publishing industry and are per *edition* rather than per *book*.

d.

Hoi, Librarians are aware of this, that is one reason why the notion of the Wikicat, to follow the conventions of librarians has merit. Thanks, GerardM

David Gerard

3:21 p.m.

On 03/09/06, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

David Gerard wrote:

...
On 03/09/06, Ray Saintonge saintonge@telus.net wrote:

...

...
...
I have never been keen on ISBNs either. In addition to older works that

...

...
Yes. ISBNs are useful indeed, but they were created for the use of the publishing industry and are per *edition* rather than per *book*.

...

Librarians are aware of this, that is one reason why the notion of the Wikicat, to follow the conventions of librarians has merit.

Ah, of course, we can reduce it to a previously solved problem!

- d.

Andrew Gray

3:25 p.m.

On 03/09/06, David Gerard dgerard@gmail.com wrote:

...

On 03/09/06, Ray Saintonge saintonge@telus.net wrote:

...
I have never been keen on ISBNs either. In addition to older works that never had an ISBN in the first place, works with a long history of different editions can have any number of ISBNs, and at some point it would be helpful to be able to compare various editions.

Yes. ISBNs are useful indeed, but they were created for the use of the publishing industry and are per *edition* rather than per *book*.

Strictly, it's not even "per edition" - it's a sort of odd fusion of edition and production run. (Trade paperback and hardcover "editions" will often be textually identical editions - printed from the same plates, so the text and layout is identical - but just bound and marketed differently)

-- - Andrew Gray andrew.gray@dunelm.org.uk

Richard Holton

6:40 p.m.

On 9/3/06, Andrew Gray shimgray@gmail.com wrote:

...

On 03/09/06, David Gerard dgerard@gmail.com wrote:

...
On 03/09/06, Ray Saintonge saintonge@telus.net wrote:

...
I have never been keen on ISBNs either. In addition to older works

that

...
...
never had an ISBN in the first place, works with a long history of different editions can have any number of ISBNs, and at some point it would be helpful to be able to compare various editions.

Yes. ISBNs are useful indeed, but they were created for the use of the publishing industry and are per *edition* rather than per *book*.

Strictly, it's not even "per edition" - it's a sort of odd fusion of edition and production run. (Trade paperback and hardcover "editions" will often be textually identical editions - printed from the same plates, so the text and layout is identical - but just bound and marketed differently)

--

Andrew Gray andrew.gray@dunelm.org.uk

But isn't the edition of the book crucial in a bibliographic entry? Page numbers can change between editions, not to mention actual content.

-Rich

-- [[W:en:User:Rholton]]

Ray Saintonge

4 Sep 4 Sep

12:36 a.m.

Richard Holton wrote:

...

On 9/3/06, Andrew Gray shimgray@gmail.com wrote:

...
On 03/09/06, David Gerard dgerard@gmail.com wrote:

...
On 03/09/06, Ray Saintonge saintonge@telus.net wrote:

...
I have never been keen on ISBNs either. In addition to older works that

never had an ISBN in the first place, works with a long history of different editions can have any number of ISBNs, and at some point it would be helpful to be able to compare various editions.

Yes. ISBNs are useful indeed, but they were created for the use of the publishing industry and are per *edition* rather than per *book*.

Strictly, it's not even "per edition" - it's a sort of odd fusion of edition and production run. (Trade paperback and hardcover "editions" will often be textually identical editions - printed from the same plates, so the text and layout is identical - but just bound and marketed differently)

But isn't the edition of the book crucial in a bibliographic entry? Page numbers can change between editions, not to mention actual content.

Yes and no. The original citer works with what he has; the Wikipedian who undertakes to verify the source has another edition with a different ISBN, or the one may have the hardbound and the other a softbound edition with a different ISBN, or the ISBNs may differ because one is the American and the other is the British printing. It's not a simple problem.

With Wikipedia we can trace the evolution of a text, but that is not so easy with paper books.

Akash Mehta

1:56 a.m.

It seems like a good idea, but I'm not so sure if using actual tags would be the right way to go. Tags would allow content to really be done dynamically, but currently using templates that call the cite templates seems to be the easiest way, and a system for indexing such templates and encouraging the creation of others would be simpler to implement.

On 9/4/06, Bogdan Giusca liste@dapyx.com wrote:

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

Currently, a book reference is done this way:

<ref>[[Istvan Vasary]] (2005) ''Cumans and Tatars'', [[Cambridge University Press]]. ISBN 123546958695, page 22</ref>

with a central bibliography, it would be like this;

<book n="Cumans and Tatars" p="22"/>

and the book database would fill in the details in the page displayed to the viewer.

I suggest that we use the name instead of the ISBN, because it can be seen more clearly in the text, when reading the wiki-text. Also, there are many books (especially older books, but not only), which have no ISBN number and some which have many different editions with different ISBNs.

There is still the problem with two books with the same title, in which case we need to add the author, too, for disambiguation, but I think this problem is less on the kind of books we use for reference. Shorter titles are used especially for fiction.

There could be some benefits for having a central bibliography, other than not having to copy-paste the publishing house/ISBN, like knowing which articles refer to a certain book.

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

Jacky PB

11:49 a.m.

If you want to do this, you should first take a look at [[:en:bibtex]]. This is what scientists use to write papers/books, and they have experience with handling large databases. The main idea is that you need some form of reference. Neither the title, nor the ISBN are good enough.

If interested, I can point you in this direction.

Yours, Dpotop1

--- Bogdan Giusca liste@dapyx.com wrote:

...

I think Wikipedia really needs a central bibliography, because there are many books who are used in more than one article, some in dozens or even more.

Currently, a book reference is done this way:

<ref>[[Istvan Vasary]] (2005) ''Cumans and Tatars'', [[Cambridge University Press]]. ISBN 123546958695, page 22</ref>

with a central bibliography, it would be like this;

<book n="Cumans and Tatars" p="22"/>

and the book database would fill in the details in the page displayed to the viewer.

I suggest that we use the name instead of the ISBN, because it can be seen more clearly in the text, when reading the wiki-text. Also, there are many books (especially older books, but not only), which have no ISBN number and some which have many different editions with different ISBNs.

There is still the problem with two books with the same title, in which case we need to add the author, too, for disambiguation, but I think this problem is less on the kind of books we use for reference. Shorter titles are used especially for fiction.

There could be some benefits for having a central bibliography, other than not having to copy-paste the publishing house/ISBN, like knowing which articles refer to a certain book.

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org

http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

...

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

6651

Age (days ago)

6655

Last active (days ago)

wikipedia-l@lists.wikimedia.org

34 comments

17 participants

tags (0)

participants (17)

Akash Mehta
Andrew Gray
Berto
Bogdan Giusca
Bryan Derksen
David Gerard
Erik Moeller
Francis Tyers
George Herbert
Gerard Meijssen
GerardM
Jacky PB
Lars Aronsson
maru dubshinki
Ray Saintonge
Richard Holton
Sabine Cretella