[Foundation-l] Wikisource and reCAPTCHA

Andre Engels andreengels at gmail.com
Wed Jun 30 11:35:29 UTC 2010

On Wed, Jun 30, 2010 at 12:42 PM, Samuel J Klein <sj at wikimedia.org> wrote:

> < PGDP has a very strict and arduous workflow...  The
>> result is quality, however only the text is sent downstream.
> Why not send images and text downstream?

Because PGDP produces for Project Gutenberg, which publishes text and
html versions, not scans.

> Perhaps we have competing interfaces / workflows.  but I expect we
> would be glad to share 99.99%-verified high-quality
> texts-unified-with-images if it were easy for both projects to
> identify that combination of quality and comprehensive data... and
> would be glad to share metadata so that a WS editor could quickly
> check to see if there's a PGDP effort covering an edition of the text
> she is proofing; and vice-versa.

For the PGDP side, it's possible to check at PGDP itself (one will
need to get a login for that, but it's as free and unencumbered as the
same on Wikimedia), but there is also a useful superset at
http://www.dprice48.freeserve.co.uk/GutIP.html (warning! I'm talking
of a 7 megabyte html file here). This contains, sorted by author
(books by more than one author given multiple times) all books that
have a clearance for Project Gutenberg.

For cooperation, one idea could be to get the PGDP material either
after the P3 stage or after the F2 stage. As long as a project is
still active, it isn't hard at all to get both the text and the scan

André Engels, andreengels at gmail.com

More information about the wikimedia-l mailing list