On 05/24/2013 09:11 AM, Andrea Zanni wrote:
I remember, for example, an awesome tool from Alex Brollo, postOCR,
a js script which corrects automatically most common OCR errors and converts apostrophes.
Where is this? Is it documented in English?
Andrea mentioned two different tools merged into one.
1. postOCR code comes mainly from Pathoschild's RegexMenuFramework with minor changes for Italian OCR errors.
2. apostrophes conversion (from keyboard, typewriter one ' into real apostrophe character ’) comes from an original it.source script (in python to be used by a bot, and in js to be merged into postOCR); it's very complex, since conversions into templates, link, html tags, math tags and wiki markup must be avoided. This it far from simple, since regex doesn't help to manage nested templates/nested code structures. No, we don't document this stuff. We simply use it.... a lot.
Alex