Hello,
I'm just wondering, would it be feasible to convert wiki text (without OCR markup) back into OCR markup? A script might strip or convert markup, diff the original OCR text with the wiki text to determine what goes where, and generate the markup from scratch.
You could thus cleanly convert from OCR markup to wiki markup and back without unreadable OCR markup on the wiki, and this could also be used to provide some other very useful features (I would love to accurately diff an entire Wikisource text with OCR scans of different printed documents, for example).