@Anika: happy to know that you like "visualizzatore" and that you
discovered the search function, that is perhaps the most useful trick,
together with pre-viewing of OCR for "red" pages, the latter allowing to
refine a book-specific shared regex set.
Alex
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Mail
priva di virus.
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
2017-10-16 20:09 GMT+02:00 Anika Born <WikiAnika(a)wikipedia.de>de>:
as aubrey: Thank you very much!
I shared these news at the Scriptorium of de.ws.
I also used the opportunity to inform them about your "Visualizzatore".
This is so cool!!!! (especially the search-function)
And because I had some time (and the best things come in threes) I invited
them to your it.WikiCon in Trento (
https://meta.wikimedia.org/
wiki/ItWikiCon/2017/Proposte#Wikisource). Have fun there! My best wishes
to the organizers. I co-organized it three times in a row for the
all-German-Community....
https://de.wikisource.org/wiki/Wikisource:Skriptorium#
Italien:_17._bis_19._November_WikiCon_in_Trient
Anika
2017-10-16 19:35 GMT+02:00 Andrea Zanni <zanni.andrea84(a)gmail.com>om>:
Thanks Alex!
I really hope this is a direction where other developers will follow:
being able to harness the full potential of structured data from OCR
software is absolutely crucial for Wikisource:
we could actually automatize *a lot* of the formatting work now done by
volunteers, and their time could be spent still formatting, proofreading
and validating, but with much power than before.
IMO, it changes a lot if a book is formatted ~50% by a machine, we could
do much more books in less time.
Go Alex!
Aubrey
On Mon, Oct 16, 2017 at 5:42 PM, Asaf Bartov <abartov(a)wikimedia.org>
wrote:
That's really promising!
Thank you for sharing this.
A.
On Oct 17, 2017 00:11, "Alex Brollo" <alex.brollo(a)gmail.com> wrote:
Here:
Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46
<https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46>
and immediately previous and following pages both the text and some
formatting from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz
<https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz>
(in previous pages only some templates have been added and a little
bit of regex manipulation has be done)
Internet Archive _abbyy.gz files are gzipped, enormous xml files where
any detail of FineReader OCR output is exported - but, even if enormous and
terribly complex, they can be parsed and any detail (a little bit
painfully...) can be used; presently, only bold, italic, smallcaps and
paragraphs have been explored, translated into wiki code by a prettily
simple python code.
Alex
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l