Text layer Djvu (was: Biodiversity Heritage Library) - Wikisource-l

6 Aug 2012

If someone is interested,
Alex Brollo is digging into the djvu layer issue,
we have a Dropbox folder with all the files.
If you are interested in working on that, please drop me a mail.

What we can show you right now is this:
https://www.dropbox.com/s/lu6re2a02xp0nyc/Dialogo%20della%20salute%20djvu%2…

As you can see, the text is not mapped again into the djvu, but it is
"stored" all togheter in a region of the djvu page (in this case, left
angle below).
It is very difficult to re-map the text, for example because when we use
the tag <ref> for footnotes we destroy the pattern :-(

The cool thing is that the text inside is already formatted in wikitext!
https://www.dropbox.com/s/s2c0op5e9jeu47o/Dialogo%20della%20salute%20WS%20s…
Alex assures me this is easy and just uses few scripts from djvulibre
(which is already installed in toolserver).
The same could be made uploading wiki-rendered HTML into text layer.

This could be very interesting for other websites: they could just
copy-and-paste the HTML file, or extract it with a simple python script
calling for djvuLibre routines,  and then use the Commons file as a
benchmark.
We could, maybe, give back some of our books to the Gutenberg project.
Or, maybe, give it back to GLAMs.

What do you think?

Aubrey and Alex