[Foundation-l] NEH grant

Tue Jan 16 00:19:45 UTC 2007

On 16/01/07, Andre Engels <andreengels at gmail.com> wrote:

> My idea was to scan these materials, put them on the web, and then make it
> available to some wiki-like collaborative effort. Volunteers from all over
> the world could then make themselves useful by doing the transcription of
> the material (I am thinking in the first place of manuscript material here -
> printed material is probably easier to transcribe through machine character
> recognition) and by creating descriptions, indices etcetera.
>
> My idea would be to have something wiki-like, where the main pages would
> consists of a scan, its transcription (if available) and an area for
> comments.

And the original images would still be available next to the
transcription in case of dispute (or a bad transcription). I seem to
recall at least one major genealogical project is doing this - a
Canadian census? - and it might be worth looking into that for
information on how they work it

[checks]

http://automatedgenealogy.com/census/cache/index.html - using free(?)
images from a government body, I think.

They don't use a wiki, but they do have the split-screen thing and
what looks like a line-by-line database to feed material into. The
line-by-line nature of the source adapts itself easily to this, of
course, but it would still work with anything you can chop into
reasonably discrete segments. And a wiki does seem the obvious tool to
use, though I know we say that about almost everything!

It wouldn't be perfectly accurate, but it would be searchable, which
is basically what most people are doing with scans of printed material
now; giving it a quick OCR pass to get a mostly-searchable version of
the text and then handing an image of the page to the searcher. (And I
suppose there is always the possibility for using the results of
large-scale distributed transcription to help work on
historic-handwriting OCR - we can do modern hand much more easily than
we can even century-old copperplate, much less any of the older
"legal" styles of writing. But that's a long way away)

We certainly wouldn't solve the "digitisation problem", and having the
images out there with good metadata on them is arguably of more
academic use than doing the transcriptions, but it would be very
interesting to have one specific (largish) thing to work on, release
it to the world as a trial project, see if it takes off.

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk