On 3/27/20 3:05 PM, Karen Coyle wrote:
Open Library (https://openlibrary.org) has some good features (full disclosure: I was on the original OL team at the Archive) but it doesn't solve the wheat/chaff problem - something that all large libraries have. It also doesn't have a way to provide a useful order of retrievals, which is also the case for the Google Books site (OCLC uses numbers of holdings, which is pretty good, but no one else has access to that data).
There are some other metrics one can use for a useful ordering. Number of views/downloads, which the Internet Archive tracks, is a popular one, and useful to a point, though it shouldn't be used to exclusion of all else, as it can can end up hiding useful materials on less popular topics.
For subject browsing on The Online Books Page, I use some metadata- based measures to rank within subject categories. Dates are used for ranking, where boosts are given not just for recency (as also occurs in many OPACs) but also for temporal proximity to the subject, if we have relevant dates in the heading. (So books on the American Civil War published during or near the time of the war get a boost, for instance.) I also use the ordering of subjects in my records as an estimate of their importance, so listings for subject X will turn up books with X is their first subject above books with that as their fifth subject. (That's one reason I really hope libraries don't drop support for librarian-assigned subject ordering, as some newer systems do.) We also cluster similar subjects, an effect that's most relevant for subjects that don't have many books filed under them.
I also give a boost to "work" clusters (which in my case are manually rather than automatically created; though in an automated system one could use number of editions or amount of metadata recorded for them as a rough estimate of how important publishers and librarians have found a work-- at least if the clustering is reasonably accurate.)
There are other techniques one can use for useful ordering. These are ones I've found worth implementing on my sites, and could also be used elsewhere if one saw fit.
John
I would love to see curated collections from these book databases. Open Library has lists, but they are personal lists and not well managed. How can we create useful collections from these online materials?
I'll mention that one project I did was comparing the holdings in a public library to the Open Library open access books so that the library could offer unlimited access to books where they would generally have only a few hard copy items. This was in keeping with the sense of their collection but also expanded access. If we could link from digitized copies to library collections that would be a huge gain. It solves the wheat/chaff problem, although not the ranking one. The problem there is matching works/expressions (ISBN is not good enough).
Anyway, onward - and if anyone wishes to manage a project, please post widely as I think a crowd-sourced solution is much needed.
kc
On 3/26/20 2:12 PM, Federico Leva (Nemo) wrote:
Karen Coyle, 26/03/20 17:44:
Unfortunately, until someone turns this into a library it's just a random pile of books.
I think the general idea is that archive.org is indeed the "pile of books" while the actual library (aspirationally) is openlibrary.org. Looking at the collection on archive.org is like looking at the compactus room or the inventory books.
Federico