On 16/01/07, Andre Engels <andreengels(a)gmail.com> wrote:
My idea was to scan these materials, put them on the
web, and then make it
available to some wiki-like collaborative effort. Volunteers from all over
the world could then make themselves useful by doing the transcription of
the material (I am thinking in the first place of manuscript material here -
printed material is probably easier to transcribe through machine character
recognition) and by creating descriptions, indices etcetera.
My idea would be to have something wiki-like, where the main pages would
consists of a scan, its transcription (if available) and an area for
comments.
And the original images would still be available next to the
transcription in case of dispute (or a bad transcription). I seem to
recall at least one major genealogical project is doing this - a
Canadian census? - and it might be worth looking into that for
information on how they work it
[checks]
http://automatedgenealogy.com/census/cache/index.html - using free(?)
images from a government body, I think.
They don't use a wiki, but they do have the split-screen thing and
what looks like a line-by-line database to feed material into. The
line-by-line nature of the source adapts itself easily to this, of
course, but it would still work with anything you can chop into
reasonably discrete segments. And a wiki does seem the obvious tool to
use, though I know we say that about almost everything!
It wouldn't be perfectly accurate, but it would be searchable, which
is basically what most people are doing with scans of printed material
now; giving it a quick OCR pass to get a mostly-searchable version of
the text and then handing an image of the page to the searcher. (And I
suppose there is always the possibility for using the results of
large-scale distributed transcription to help work on
historic-handwriting OCR - we can do modern hand much more easily than
we can even century-old copperplate, much less any of the older
"legal" styles of writing. But that's a long way away)
We certainly wouldn't solve the "digitisation problem", and having the
images out there with good metadata on them is arguably of more
academic use than doing the transcriptions, but it would be very
interesting to have one specific (largish) thing to work on, release
it to the world as a trial project, see if it takes off.
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk