Alex Brollo, 14/06/2013 08:45:
IA gives abbyy xml files too (as .gz files); I opened one of them after a suggestion of Phe, and I'm dreaming about extracting anything useful to help proofreading. The only "small" problem is that I barely know what a xml is and that is similat to html in its (well-formed) structure, and that something called XLST exists. :-(
Is any of you working about abbyy xml files with a "little bit" of more skill?
Someone produced something here: https://groups.google.com/forum/?fromgroups#!topic/abbyy-ocr-for-linux/Ih7no7KwslA Also, from 2012: a planned "lura2hocr -- convert Luratech Abbyy XML to hOCR" https://code.google.com/p/hocr-tools/wiki/PageName
Nemo