On 05/24/2013 09:11 AM, Andrea Zanni wrote:
I remember, for example, an awesome tool from Alex Brollo, postOCR,
a js script which corrects automatically most common OCR errors and
converts apostrophes.
Where is this? Is it documented in English?
Andrea mentioned two different tools merged into one. 1. postOCR code comes mainly from Pathoschild's RegexMenuFrameworkhttp://meta.wikimedia.org/wiki/User:Pathoschild/Scripts/Regex_menu_framework with minor changes for Italian OCR errors. 2. apostrophes conversion (from keyboard, typewriter one ' into real apostrophe character ’) comes from an original it.source script (in python to be used by a bot, and in js to be merged into postOCR); it's very complex, since conversions into templates, link, html tags, math tags and wiki markup must be avoided. This it far from simple, since regex doesn't help to manage nested templates/nested code structures. No, we don't document this stuff. We simply use it.... a lot.
Alex
2013/5/25 Lars Aronsson lars@aronsson.se
On 05/24/2013 09:11 AM, Andrea Zanni wrote:
I remember, for example, an awesome tool from Alex Brollo, postOCR, a js script which corrects automatically most common OCR errors and converts apostrophes.
Where is this? Is it documented in English?
As an example, we are collaborating right now with a philologist (a
digital humanist) who put text on Wikisource, proofread them with the community, and then works on them.
Do you document and distribute your experience?
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
______________________________**_________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l