While trying to fix some failures of IA Upload an unexpected result emerged: an easy opportunity of fixing some usual OCR errors into djvu text layer.
In brief, the script xml2dsed.py https://it.wikisource.org/wiki/Progetto:Bot/Programmi_in_Python_per_i_bot/xml2dsed.py converts IA _djvu.xml files into a "dsed" (lisp-like) code, so that text layer can be uploaded into djvu file into a much faster and controllable way using djvused.exe. While parsing the xml tree, at WORD level any word of the text layer is exposed to the script environment as pure text; this offers a unique opportunity to fix many scannos, avoiding any risk to mess the xml or the dsed code.
Here the first djvu file https://commons.wikimedia.org/wiki/File:Trattati_del_Cinquecento_sulla_donna,_1913_%E2%80%93_BEIC_1949816.djvu where this has been successfully tested.
Alex brollo
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Mail priva di virus. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>