[Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics]

Yann Forget yann at forget-me.net
Thu Aug 13 10:18:59 UTC 2009


Keeping a copy to wikisource-l. Yann

-------- Original Message --------
Subject: Re: [Foundation-l] Open Library, Wikisource,	and cleaning and
translating OCR of Classics
Date: Thu, 13 Aug 2009 01:48:37 -0400

DGG, I appreciate your points.  Would we be so motivated by this
thread if it weren't a complex problem?

The fact that all of this is quite new, and that there are so many
unknowns and gray areas, actually makes me consider it more likely
that a body of wikimedians, experienced with their own form of
large-scale authority file coordination, are in a position to say
something meaningful about how to achieve something similar for tens
of millions of metadata records.

> OL rather than Wikimedia has the advantage that more of the people
> there understand the problems.

In some areas that is certainly so.  In others, Wikimedia communities
have useful recent experience.  I hope that those who understand these
problems  on both sides recognize the importance of sharing what they
know openly -- and  showing others how to understand them as well.  We
will not succeed as a global community if we say that this class of
problems can only be solved by the limited group of people with an MLS
and a few years of focused training.  (how would you name the sort of
training you mean here, btw?)

SJ

On Thu, Aug 13, 2009 at 12:57 AM, David Goodman<dgoodmanny at gmail.com> wrote:
> Yann & Sam
>
> The problem is extraordinarily   complex. A database of all "books"
> (and other media) ever published is beyond the joint  capabilities of
> everyone interested. There are intermediate entities between "books"
> and "works", and important subordinate entities, such as "article" ,
> "chapter" , and those like "poem" which could be at any of several
> levels.  This is not a job for amateurs, unless they are prepared to
> first learn the actual standards of bibliographic description for
> different types of material, and to at least recognize the
> inter-relationships, and the many undefined areas. At research
> libraries, one allows a few years of training for a newcomer with just
> a MLS degree to work with a small subset of this. I have thirty years
> of experience in related areas of librarianship, and I know only
> enough to be aware of the problems.
> For an introduction to the current state of this, see
> http://www.rdaonline.org/constituencyreview/Phase1Chp17_11_2_08.pdf.
>
> The difficulty of merging the many thousands of partial correct and
> incorrect sources of available data typically requires the manual
> resolution of each of the tens of millions of instances.
>
> OL rather than Wikimedia has the advantage that more of the people
> there understand the problems.
>
> David Goodman, Ph.D, M.L.S.
> http://en.wikipedia.org/wiki/User_talk:DGG

-- 
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres



More information about the Wikisource-l mailing list