On Mon, Dec 9, 2013 at 12:18 PM, Tom Morris <tfmorris@gmail.com> wrote:
 
I'm not sure I agree.  There's a lot of good data in OpenLibrary, but there's also a lot of junk.  Freebase imported a bunch of OpenLibrary data, after winnowing it to what they thought was the good stuff, and still ended up deleting a bunch of the supposedly "good" stuff later because they found their goodness criteria hadn't been strict enough.

One of the reasons OpenLibrary is such a mess is because *they* arbitrarily imported junky data (e.g. Amazon scraped records).  The last thing the world needs is more duplicate copies of random junk.  We've already got the DPLA for that. :-)

Another issue with the OpenLibrary metadata is that there's no clear license associated with it.  IA's position is that they got it from wherever they got it from and you're own your own if you want to reuse it, which isn't very helpful.  The provenance for major chunks of it is traceable and new stuff by users is nominally being contributed under CC0, so they could probably be sorted out with enough effort (although the same thing is true of the data quality issues too).



Gosh, I withdraw my support for fully reusage of Open Library data. 

That was probably the best efforts they can do in past years, before the mass disponibilization of data dumps directly from well known libraries catalogs, but now we are in a very different scenario.

Even a simple mass import from the already mentioned datahub [1] in the openlibrary engine (open source software) without further editing will generate best quality data.

[1] - http://datahub.io/group/bibliographic