Nevertheless consider the file structure inside archive.org, who collects images into zip files and text into _djvu.xml files, so allowing to manage its brilliant viewer. Djvu format really can be used as a compact images+xml container, but it seems an obsolete file format, as recent discontinuation of output by archive.org suggests. Pdf is IMHO too complex and can't be considered an open format.
Alex brollo
Il giorno sab 6 lug 2019 alle ore 10:51 David Starner prosfilaes@gmail.com ha scritto:
From my perspective, a DjVu or PDF file is just an archive format for images. Any text that comes along with them is ancillary; if it's missing, we can always generate it from OCR. I could just as well use CBR/CBZ files, though they're not as reliable for having a sensible format. I want to avoid, as much as possible, dealing with a bunch of disconnected page images, because that maximizes the possibility for human error.
-- Kie ekzistas vivo, ekzistas espero.
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l