[Wikitech-l] Full support for djvu file

3 May 2012


      ...
Message: 5
Date: Thu, 3 May 2012 08:33:45 +0200
From: Alex Brollo alex.brollo@gmail.com
To: Wikimedia developers wikitech-l@lists.wikimedia.org
Subject: [Wikitech-l] Full support for djvu files
Message-ID:
CAH_M_mPXxD9LeMjHCm65CRAvoqN5W45O5dGO+TeH1C0f_hc4rg@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
Djvu files are the wikisource standard supporting proofreading. They have
very interesting features, being fully "open" in structure and layering,
and allowing a fast and effective sharing into the web, when they are
stored in their "indirect" mode. Most interesting, their text layer - which
can be easily extracted - contains both the mapped text from OCR and
metadata. A free library - divuLibre - allows full command line access to
any file content.
Presently, djvu files structure and features are minimally used. Indirect
mode is IMHO not supported at all, there's no mean to access to mapped text
layer nor to metadata, and only the "full text" can be accessed once, when
creating a new page into Page namespace.
It would be great IMHO:

to support indirect mode as the standard;

to allow free, easy access to the full text layer content from wikisource


user interface.
Alex
Text layer is stored in img_metadata, which means it can be retrieved
by the API (using ?action=query&prop=imageinfo&iiprop=metadata).
However when I tried to test this, it didn't seem to work. Maybe
trying to return the entire text layer hit some max api result size
limit or something. (It'd be really nice if we had some nicer place to
store information about files, especially for huge things like the
text layer which we don't generally want to load the entire thing all
the time. There's a bug about that somewhere in bugzilla land).
Indirect mode (From what I can find out from google) is when you have
an index djvu file that has links to all the pages making up the djvu
file, so you can start viewing immediately and pages are only
downloaded as needed. I'm not sure how such a format would work in
terms of uploading it. Unless we convert it on the server side, how
would we upload all the constitutiant files (I suppose we could tell
people to upload tarballs. Then we have to make sure to validate the
contents, and communicate to people that the tarball is only for
uploaded djvu files). [Of course until 5 minutes ago I'd never heard
of an indirect djvu file, so I could be misunderstanding]
-bawolff

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Full support for djvu file