Delirium wrote:
I don't mean this to sound particularly harsh, but I'm wondering why we're doing this on Wikisource, when Distributed Proofreaders for Project Gutenberg already has a well-debugged workflow for taking texts from images to OCR to proofread to a final version. Is there an advantage to starting our own project that does the same thing they already do pretty well?
This is a fascinating topic with many facets.
My starting point was not PGDP vs Wikisource, but Project Runeberg's existing software and workflow vs MediaWiki.
I hope that the text and illustrations of this old encyclopedia are useful on their own. Having the scanned images on Wikimedia Commons make them easily accessible. Having the text on Wikisource, where it can be proofread with wiki markup makes it easy to reuse in Wikipedia and other projects.
But this is also a demonstration of a new and different principle for digitization and proofreading of old books. This started out as an informal discussion during Wikimania, as you can read on http://meta.wikimedia.org/wiki/User:LA2#Digitizing_books_with_MediaWiki
One argument against PGDP's current solution is that it is a workflow only, it is not a wiki. Once they finish proofing a book and ship the e-text to Project Gutenberg, there is no way to go back and correct errors (the wiki way). They also don't publish the scanned images, so if you suspect an error (right or wrong) in a Project Gutenberg e-text you cannot go back and look at the scanned image. These drawbacks are overcome by publishing the scanned images and using a wiki approach to proofreading (never-ending, non-linear). This is what Project Runeberg does, and what this new MediaWiki/Wikisource demo does.
People are asking me why I don't publish Project Runeberg's software to allow other projects to be started with a similar structure. My answer is that the source code is ugly, not developed to be distributed. Instead of cleaning up that code, it would make more sense to improve MediaWiki to support digitization and proofreading. In fact, MediaWiki can already be used for this. How can I demonstrate that? By starting my own MediaWiki site, of course. But since I'm lazy I decided instead to use Wikisource for my little demo.
I think it is reasonable to put in question why Wikisource exists at all, since we already have all these other digitization projects. My point was never to redefine Wikisource. I just happened to use it for this encyclopedia. Some people in the English Wikisource "Scriptorium" (community discussion) have welcomed my initiative.