On Wed, Jun 30, 2010 at 12:42 PM, Samuel J Klein <sj(a)wikimedia.org> wrote:
< PGDP has a very strict and arduous workflow...
The
result is quality, however only the text is sent
downstream.
Why not send images and text downstream?
Because PGDP produces for Project Gutenberg, which publishes text and
html versions, not scans.
Perhaps we have competing interfaces / workflows. but
I expect we
would be glad to share 99.99%-verified high-quality
texts-unified-with-images if it were easy for both projects to
identify that combination of quality and comprehensive data... and
would be glad to share metadata so that a WS editor could quickly
check to see if there's a PGDP effort covering an edition of the text
she is proofing; and vice-versa.
For the PGDP side, it's possible to check at PGDP itself (one will
need to get a login for that, but it's as free and unencumbered as the
same on Wikimedia), but there is also a useful superset at
http://www.dprice48.freeserve.co.uk/GutIP.html (warning! I'm talking
of a 7 megabyte html file here). This contains, sorted by author
(books by more than one author given multiple times) all books that
have a clearance for Project Gutenberg.
For cooperation, one idea could be to get the PGDP material either
after the P3 stage or after the F2 stage. As long as a project is
still active, it isn't hard at all to get both the text and the scan
pages.
--
André Engels, andreengels(a)gmail.com