On Wed, Jul 21, 2010 at 5:47 PM, David Goodman dgoodmanny@gmail.com wrote:
Sure, but first, is this capable of being done at all? I have never seen a method of bibliographic control that can cope with the complete range of publications, even just print publications. Perhaps we need to proceed within narrow domains.
I assume that by range you mean the number of publications in a domain, and that by domain you mean the type of publication, be it a book, webpage or map.
The generic nature of a markup such as wiki template syntax allows us to easily adapt the same application to new domains. The challenge of the range within a domain is largely one of resolving ambiguities, which can be settled with policies that carefully adjudicate troublesome cases.
Second, is this capable of being done by crowd-sourcing, or does it require enforceable standards? The work of Open Library is not a promising model, being a uncontrolled mix, done to many different standards. Actually, within the domain of scientific journal articles from the last 10 years in Western languages, the best current method seems to be a mechanical algorithm, the one used by Google Scholar. True, it does not aggregate perfectly--but it does aggregate better than any other existing database. And it does not get them all--nor could it no matter how much improved, for many of the versions that are actually available are off limits to its crawlers.
In my conception the enforceable standards are to emerge in the meta pages of this project based on the actual issues that the community encounters.
Googlebot has many deep web accounts to journals online. When you search Google Scholar the relevance algorithm is actually comparing your query to the content of pdf pages which you do not have permission to access. Of course, Google can't access them all, but many publishers have found it in their interest to give them a complimentary account since it drives subscription rates.
We can rely on individuals, particularly academics, who have access to the deep web to help us curate the bibliography. And we can rely on the massive number of personal bibliographies already out there to help us get good coverage.
Cleaning up the mass of bibliographic content that I anticipate would be uploaded by users would require the writing of bots in coordination with the creation of policy pages.
Getting rid of copyright material would be handled in the same manner, I presume. After major content publishers see what we are doing, I am sure they will let us know their opinion about what we can and cannot do. It seems likely that they will overreach their bounds, and as I have seen on Wikipedia, the community members will happily ignore them. Or, if they think the requests are actually in compliance with the law, they will comply.
Brian