Google has attempted to answer the question of how many books exist in a very interesting blog post.
http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-counte...
Why am I posting this to Foundation-l?
Well, one of the things it reveals is the difficulty of answering this question and I hope that it has some relation to Wikimedia projects; in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs. So, if nothing else, it might impact...
http://en.wikipedia.org/wiki/Wikipedia:ISBN
And I also thought that Google's attempt to catologue all books was parallel to our goal of... well, I'm not sure that we ever say we're attempting to catalogue ALL knowledge... but we seem to be making a decent fist of it so far.
Nevertheless, I confess that I'm still not sure I should be posting this to Foundation-l... and it strikes me that perhaps the only guidance I can find on what should be posted could perhaps be fleshed out a little more:
https://lists.wikimedia.org/mailman/listinfo/foundation-l
Apologies if this email strikes you as cruft.
Still, damn interesting blog post, eh?
On Thu, Aug 5, 2010 at 4:23 PM, Bod Notbod bodnotbod@gmail.com wrote:
Google has attempted to answer the question of how many books exist in a very interesting blog post.
http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-counte...
Interesting! This, in a nutshell, is why projects to collect all the world's bibliographic data face a hard challenge.
Why am I posting this to Foundation-l?
Well, one of the things it reveals is the difficulty of answering this question and I hope that it has some relation to Wikimedia projects; in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs. So, if nothing else, it might impact...
AFAIK, this is a fairly uncommon problem; I've never run across it in 6+ years of working with lots of books & library catalogs every day. What is a much, much, much bigger problem is the issue of serials holdings: "serials" are normally taken to be things like magazines and journals, but in library land also might refer to, say, book series, or government reports that are published with serial numbers. All sorts of stuff, in other words, and it's cataloged and referred to in all sorts of ways, which makes it tough for people looking for good unique identifiers (or trying to figure out what counts as "a book").
And I also thought that Google's attempt to catologue all books was parallel to our goal of... well, I'm not sure that we ever say we're attempting to catalogue ALL knowledge... but we seem to be making a decent fist of it so far.
It's certainly related to recent thoughts about a bibliographic wiki; obviously relevant to wikibooks; and it's interesting to think about scale, which is something that's been on my mind lately. I don't know how much effort Google made to get records from national libraries in remote reaches of the world, but I'd imagine that there is still a big chunk of stuff missing from this count that's not in OCLC etc. Nonetheless I think posts like this help delineate the general scale of the information universe that we are trying to usefully capture. I don't have any idea how those 130M books might map onto topics, for instance, but I'm guessing our 15M articles don't quite cover it yet.
phoebe
On Thu, Aug 5, 2010 at 8:18 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
On Thu, Aug 5, 2010 at 4:23 PM, Bod Notbod bodnotbod@gmail.com wrote:
in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs. So, if nothing else, it might impact...
AFAIK, this is a fairly uncommon problem; I've never run across it in 6+ years of working with lots of books & library catalogs every day.
It varies by publisher--for example, in my experience, Harlequin (a publisher of romance novels) seems to have used all of its ISBNs *at least* twice. It's a real problem, if you expect an ISBN to be a unique ID for a book, and worse if you wanted to it be unique to edition or so on. Well, it's a minor issue from out point of view, I guess. How would Mediawiki scale to 130 million articles? Gotta cover everything...
On Thu, Aug 5, 2010 at 10:19 PM, Tracy Poff tracy.poff@gmail.com wrote:
On Thu, Aug 5, 2010 at 8:18 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
On Thu, Aug 5, 2010 at 4:23 PM, Bod Notbod bodnotbod@gmail.com wrote:
in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs. So, if nothing else, it might impact...
AFAIK, this is a fairly uncommon problem; I've never run across it in 6+ years of working with lots of books & library catalogs every day.
It varies by publisher--for example, in my experience, Harlequin (a publisher of romance novels) seems to have used all of its ISBNs *at least* twice. It's a real problem, if you expect an ISBN to be a unique ID for a book, and worse if you wanted to it be unique to edition or so on. Well, it's a minor issue from out point of view, I guess. How would Mediawiki scale to 130 million articles? Gotta cover everything...
The number of notable subjects covered in all those books is much much greater than >> 130 million.
Thanks, Pharos
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
A related link http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons#Size_of_other_inform...
2010/8/6 Pharos pharosofalexandria@gmail.com
On Thu, Aug 5, 2010 at 10:19 PM, Tracy Poff tracy.poff@gmail.com wrote:
On Thu, Aug 5, 2010 at 8:18 PM, phoebe ayers phoebe.wiki@gmail.com
wrote:
On Thu, Aug 5, 2010 at 4:23 PM, Bod Notbod bodnotbod@gmail.com wrote:
in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs. So, if nothing else, it might impact...
AFAIK, this is a fairly uncommon problem; I've never run across it in 6+ years of working with lots of books & library catalogs every day.
It varies by publisher--for example, in my experience, Harlequin (a publisher of romance novels) seems to have used all of its ISBNs *at least* twice. It's a real problem, if you expect an ISBN to be a unique ID for a book, and worse if you wanted to it be unique to edition or so on. Well, it's a minor issue from out point of view, I guess. How would Mediawiki scale to 130 million articles? Gotta cover everything...
The number of notable subjects covered in all those books is much much greater than >> 130 million.
Thanks, Pharos
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
phoebe ayers wrote:
On Thu, Aug 5, 2010 at 4:23 PM, Bod Notbod bodnotbod@gmail.com wrote:
Well, one of the things it reveals is the difficulty of answering this question and I hope that it has some relation to Wikimedia projects; in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs. So, if nothing else, it might impact...
AFAIK, this is a fairly uncommon problem; I've never run across it in 6+ years of working with lots of books & library catalogs every day. What is a much, much, much bigger problem is the issue of serials holdings: "serials" are normally taken to be things like magazines and journals, but in library land also might refer to, say, book series, or government reports that are published with serial numbers. All sorts of stuff, in other words, and it's cataloged and referred to in all sorts of ways, which makes it tough for people looking for good unique identifiers (or trying to figure out what counts as "a book").
And I also thought that Google's attempt to catologue all books was parallel to our goal of... well, I'm not sure that we ever say we're attempting to catalogue ALL knowledge... but we seem to be making a decent fist of it so far.
It's certainly related to recent thoughts about a bibliographic wiki; obviously relevant to wikibooks; and it's interesting to think about scale, which is something that's been on my mind lately. I don't know how much effort Google made to get records from national libraries in remote reaches of the world, but I'd imagine that there is still a big chunk of stuff missing from this count that's not in OCLC etc. Nonetheless I think posts like this help delineate the general scale of the information universe that we are trying to usefully capture. I don't have any idea how those 130M books might map onto topics, for instance, but I'm guessing our 15M articles don't quite cover it yet.
And to think that by the year 1500 only 65,000 incunabula had been published.
One can think of "books", but I don't think that that is a useful standard unit. Too many less-book-savvy people get caught up in the notion of books as monographs. Perhaps what is needed is a new unit of knowledge that is less misleading. Serials do not promise any unity of subject. Simply listing them is no help to finding out what may be useful in them. How would we deal with problems like that of the 14th edition of the Britannica where later printings were significantly different from late printings. Some pamphlets are not even part of serials. I have a lovely one published by the US Army Field Office in North Africa telling the soldiers how they should behave in France.
Scaling needs to be built in at an early stage. We need to be able to do something entirely different from what Google and for-profit industry can do, instead of trying to compete head-on. How is volunteer power best directed?
Ray
Bod Notbod, 06/08/2010 01:23:
Well, one of the things it reveals is the difficulty of answering this question and I hope that it has some relation to Wikimedia projects; in particular, I didn't know that multiple books (entirely unrelated books) have shared ISBNs.
It's supposed not to happen...
And I also thought that Google's attempt to catologue all books was parallel to our goal of... well, I'm not sure that we ever say we're attempting to catalogue ALL knowledge... but we seem to be making a decent fist of it so far.
Well, someone suggested that this would be our job, too; see http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books... and links to previous discussions. I don't like much that Google has those closed algorithms ans such to de-duplicate book catalogues: openlibrary and national central book catalogues do that, too, and it's a big effort. Lots of /duplicate/ work here.
Nevertheless, I confess that I'm still not sure I should be posting this to Foundation-l... and it strikes me that perhaps the only guidance I can find on what should be posted could perhaps be fleshed out a little more:
Meta-wikis are the answer. :-p See the previous strategy link, http://meta.wikimedia.org/wiki/Foundation-L_Proposal , http://meta.wikimedia.org/wiki/Mailing_lists , http://meta.wikimedia.org/wiki/Improving_Foundation-l etc.
Nemo
wikimedia-l@lists.wikimedia.org