Nevertheless consider the file structure inside
archive.org, who collects
images into zip files and text into _djvu.xml files, so allowing to manage
its brilliant viewer.
Djvu format really can be used as a compact images+xml container, but it
seems an obsolete file format, as recent discontinuation of output by
archive.org suggests. Pdf is IMHO too complex and can't be considered an
open format.
Alex brollo
Il giorno sab 6 lug 2019 alle ore 10:51 David Starner <prosfilaes(a)gmail.com>
ha scritto:
From my perspective, a DjVu or PDF file is just an
archive format for
images. Any text that comes along with them is ancillary; if it's
missing, we can always generate it from OCR. I could just as well use
CBR/CBZ files, though they're not as reliable for having a sensible
format. I want to avoid, as much as possible, dealing with a bunch of
disconnected page images, because that maximizes the possibility for
human error.
--
Kie ekzistas vivo, ekzistas espero.
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l