On 16/01/07, Andre Engels andreengels@gmail.com wrote:
My idea was to scan these materials, put them on the web, and then make it available to some wiki-like collaborative effort. Volunteers from all over the world could then make themselves useful by doing the transcription of the material (I am thinking in the first place of manuscript material here - printed material is probably easier to transcribe through machine character recognition) and by creating descriptions, indices etcetera.
My idea would be to have something wiki-like, where the main pages would consists of a scan, its transcription (if available) and an area for comments.
And the original images would still be available next to the transcription in case of dispute (or a bad transcription). I seem to recall at least one major genealogical project is doing this - a Canadian census? - and it might be worth looking into that for information on how they work it
[checks]
http://automatedgenealogy.com/census/cache/index.html - using free(?) images from a government body, I think.
They don't use a wiki, but they do have the split-screen thing and what looks like a line-by-line database to feed material into. The line-by-line nature of the source adapts itself easily to this, of course, but it would still work with anything you can chop into reasonably discrete segments. And a wiki does seem the obvious tool to use, though I know we say that about almost everything!
It wouldn't be perfectly accurate, but it would be searchable, which is basically what most people are doing with scans of printed material now; giving it a quick OCR pass to get a mostly-searchable version of the text and then handing an image of the page to the searcher. (And I suppose there is always the possibility for using the results of large-scale distributed transcription to help work on historic-handwriting OCR - we can do modern hand much more easily than we can even century-old copperplate, much less any of the older "legal" styles of writing. But that's a long way away)
We certainly wouldn't solve the "digitisation problem", and having the images out there with good metadata on them is arguably of more academic use than doing the transcriptions, but it would be very interesting to have one specific (largish) thing to work on, release it to the world as a trial project, see if it takes off.