[Wikisource-l] On linking Wikisource with page images

Ryan Dabler zhaladshar at gmail.com
Tue Jan 22 16:59:20 UTC 2008


I might be misunderstanding what is being asked, but could someone explain
to me why the span tags with the OCR block information needs to be
permanent?  Would it suffice to have the span tags, proof read the OCR'd
text till it perfectly matches the scans, feed it back into the DJVU file
and then remove all the span tags to have a clean wikitext?

I would imagine once the proofed text becomes the text layer to the DJVU
file, that would be the last time we would have to even touch the text
anyway, so there would be no more modifications we would need to make to
either the DJVU or the wikitext at all.  At that point we could make the
text 100% clean.

Zhaladshar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikisource-l/attachments/20080122/e5949c6a/attachment-0001.htm 


More information about the Wikisource-l mailing list