That's really promising!
Thank you for sharing this.
A.
On Oct 17, 2017 00:11, "Alex Brollo" <alex.brollo(a)gmail.com> wrote:
Here:
Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46
<https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46>
and immediately previous and following pages both the text and some
formatting from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz
<https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz>
(in previous pages only some templates have been added and a little bit
of regex manipulation has be done)
Internet Archive _abbyy.gz files are gzipped, enormous xml files where any
detail of FineReader OCR output is exported - but, even if enormous and
terribly complex, they can be parsed and any detail (a little bit
painfully...) can be used; presently, only bold, italic, smallcaps and
paragraphs have been explored, translated into wiki code by a prettily
simple python code.
Alex
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l