While trying to fix some failures of IA Upload an unexpected result emerged: an easy opportunity of fixing some usual OCR errors into djvu text layer.

In brief, the script xml2dsed.py converts IA _djvu.xml files into a "dsed" (lisp-like) code, so that text layer  can be uploaded into djvu file into a much faster and controllable way using djvused.exe. While parsing the xml tree, at WORD level any word of the text layer is exposed to the script environment as pure text; this offers a unique opportunity to fix many scannos, avoiding any risk to mess the xml or the dsed code. 

Here the first djvu file where this has been successfully tested.

Alex brollo

Mail priva di virus. www.avast.com