as aubrey: Thank you very much!
I shared these news at the Scriptorium of de.ws.
I also used the opportunity to inform them about your "Visualizzatore". This is so cool!!!! (especially the search-function)
And because I had some time (and the best things come in threes) I invited them to your it.WikiCon in Trento ( https://meta.wikimedia.org/wiki/ItWikiCon/2017/Proposte#Wikisource). Have fun there! My best wishes to the organizers. I co-organized it three times in a row for the all-German-Community....
https://de.wikisource.org/wiki/Wikisource:Skriptorium#Italien:_17._bis_19._N...
Anika
2017-10-16 19:35 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
Thanks Alex! I really hope this is a direction where other developers will follow: being able to harness the full potential of structured data from OCR software is absolutely crucial for Wikisource: we could actually automatize *a lot* of the formatting work now done by volunteers, and their time could be spent still formatting, proofreading and validating, but with much power than before. IMO, it changes a lot if a book is formatted ~50% by a machine, we could do much more books in less time. Go Alex!
Aubrey
On Mon, Oct 16, 2017 at 5:42 PM, Asaf Bartov abartov@wikimedia.org wrote:
That's really promising!
Thank you for sharing this.
A.
On Oct 17, 2017 00:11, "Alex Brollo" alex.brollo@gmail.com wrote:
Here: Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46 https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46 and immediately previous and following pages both the text and some formatting from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz (in previous pages only some templates have been added and a little bit of regex manipulation has be done)
Internet Archive _abbyy.gz files are gzipped, enormous xml files where any detail of FineReader OCR output is exported - but, even if enormous and terribly complex, they can be parsed and any detail (a little bit painfully...) can be used; presently, only bold, italic, smallcaps and paragraphs have been explored, translated into wiki code by a prettily simple python code.
Alex
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l