[Foundation-l] Open Library, Wikisource, and cleaning and translating OCR of Classics
Ray Saintonge
saintonge at telus.net
Mon Aug 17 22:33:34 UTC 2009
David Goodman wrote:
> The problem is extraordinarily complex. A database of all "books"
> (and other media) ever published is beyond the joint capabilities of
> everyone interested. There are intermediate entities between "books"
> and "works", and important subordinate entities, such as "article" ,
> "chapter" , and those like "poem" which could be at any of several
> levels.
I've already been in raging arguments at Wikisource about the meaning of
"work". The general tendency there has been to treat "work" as
equivalent to a book or set of related books. This is highly
problematical for periodicals, encyclopedias and dictionaries.
I do agree that the problem is so complex, but there is a resistance on
the part of many to accept standards that have been developed over a
long period of time. Before the Category: namespace was made a part of
Wikipedia there was considerable antipathy to adopting any kind of
established category system. Muddling through from square one was the
preferred option.
> This is not a job for amateurs, unless they are prepared to
> first learn the actual standards of bibliographic description for
> different types of material, and to at least recognize the
> inter-relationships, and the many undefined areas. At research
> libraries, one allows a few years of training for a newcomer with just
> a MLS degree to work with a small subset of this. I have thirty years
> of experience in related areas of librarianship, and I know only
> enough to be aware of the problems.
>
This does not bode well! The big factor in Wiki participation and
success is amateur involvement and crowd sourcing. What are the PhDs
doing to bridge the gap? What efforts are being made to at least bring
the most significant points to the level of the general contributor?
Saying that it takes several years to bring an MLS up to speed is not
good enough. Knowledge needs to be brought to the level where it was
most useful. When I went to school typing was not introduced as a
subject until the 10th grade; my son learned keyboarding in the first grade.
Our wiki projects also have a superfluity of people with an IT
background who also do not do a very good job bringing information to
where it belongs, and end up creating a mind-boggling assortment of
templates of questionable value. In theory they are trying to bring
standardization and simplicity to the projects, but just as often
produce a simplistic and premature narrowing of the way knowledge is
organized.
> The difficulty of merging the many thousands of partial correct and
> incorrect sources of available data typically requires the manual
> resolution of each of the tens of millions of instances.
>
Yes, of course. There is no magic software that will do it all. Humans
need to retain the right to decide the limits of technology.
> OL rather than Wikimedia has the advantage that more of the people
> there understand the problems.
The librarians have their work cut out for them. They can help to build
a system for the future, or they can let everyone muddle their way into
a fuck-up.
Ec
More information about the foundation-l
mailing list