2009/8/1 John Vandenberg:
On Sat, Aug 1, 2009 at 5:09 PM, Samuel Klein
*A wiki for book metadata, with an entry for every published work,
statistics about its use and siblings, and discussion about its
usefulness as a citation (a collaboration with OpenLibrary, merging
Why not just do this in the Wikisource project?
99% percent of "every published work" are free/libre. Only the last
70 years worth of texts are restricted by copyright, so it doesnt make
sense to build a different project for those works.
I think your estimate's a little off, sadly :-)
Firstly, copyright lasts more than the statutory seventy years, as a
general rule - remember, authors don't conveniently die the moment
they publish. If we discount the universal one-date cutoff in the US
eighty years ago - itself a fast-receding anomaly - extant copyrights
probably last about a hundred years from publication, on average.
But more critically, whilst a hundred years is a drop in the bucket of
the time we've been writing texts, it's a very high proportion of the
time we've been publishing them at this rate. Worldwide, book
publication rates now are pushing two orders of magnitude higher than
they were a century ago, and that was itself probably up an order of
magnitude on the previous century. Before 1400, the rate of creation
of texts that have survived probably wouldn't equal a year's output
I don't have the numbers to hand to be confident of this - and
hopefully Open Library, as it grows, will help us draw a firmer
conclusion - but I'd guess that at least half of the identifiable
works ever conventionally published as monographs remain in copyright
today. 70% wouldn't surprise me, and it's still a growing fraction.
Intuitively, I think your analysis is closer to reality, but, even so,
that older 30% is more than enough to keep us all busy for a very long
time. To appreciate the size of the task consider the 1911 Encyclopædia
Britannica. It is well in the public domain, and most articles there
have a small number of sources which themselves would be in the public
domain. Only a small portion of the 1911 EB project on Wikisource is
complete to acceptable standards; we have virtually nothing from EB's
sources; we also have virtually nothing from any other edition of the EB
even though everything up to the early 14th (pre 1946) is already in the
public domain. Dealing with this alone is a huge task.
Having all this bibliography on Wikisource is conceivable, though
properly not in the Wikisource of any one language; that would be
consistent with my own original vision of Wikisource from the very first
day. A good bibliographic survey of a work should reference all editions
and all translations of a work. For an author, multiply this by the
number of his works. Paradoxically, Wikisource, like Wikipedia and like
many another mature projects, has made a virtue of obsessive minute
accuracy and uniformity. While we all treasure accuracy, its pursuit
can be subject to diminishing returns. A bigger Wikisource community
could in theory overcome this, but the process of acculturation that
goes on in mature wiki projects makes this unlikely.
Sam's reference to "book metadata" is itself an underestimate of the
challenge. It doesn't even touch on journal articles, or other material
too short to warrant the publication of a monograph.