Ryan Dabler wrote:
I might be misunderstanding what is being asked, but
could someone
explain to me why the span tags with the OCR block information needs
to be permanent? Would it suffice to have the span tags, proof read
the OCR'd text till it perfectly matches the scans, feed it back into
the DJVU file and then remove all the span tags to have a clean wikitext?
I would imagine once the proofed text becomes the text layer to the
DJVU file, that would be the last time we would have to even touch the
text anyway, so there would be no more modifications we would need to
make to either the DJVU or the wikitext at all. At that point we
could make the text 100% clean.
We would not want to change the text, but we might
want to change some
of it into wikilinks.
Ec