That's really promising!

Thank you for sharing this.

   A.

On Oct 17, 2017 00:11, "Alex Brollo" <alex.brollo@gmail.com> wrote:
Here: 
Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46 
and immediately previous and following pages both the text and some formatting  from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz (in previous pages only some templates have been added and a little bit of regex manipulation has be done)

Internet Archive _abbyy.gz files are gzipped, enormous xml files where any detail of FineReader OCR output is exported - but, even if enormous and terribly complex, they can be parsed and any detail (a little bit painfully...)  can be used; presently, only bold, italic,  smallcaps and paragraphs have been explored,  translated into wiki code by a prettily simple python code. 

Alex 



_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l