You can see a great advantage of djvu files over pdf files into the present file list of any IA item. You can see that IA removed djvu files, but it builds and publishes _djvu.xml file. Why? I presume that IA uses that file to "map words" into its book viewer, since it has a good text structure while being *pretty simple*. It can be translated into hOCR, and editing its text nodes the edited text can be uploaded again into the djvu file. Itsource is testing, on some texts, tricks to mass-fix djvu text layer (removing scannos etc.) *before* uploading it into Commons.
It's a pity IMHO that this magic book format has been disregarded. Its structure is *open* just as the pdf structure is *closed*.
Alex
2017-01-03 0:19 GMT+01:00 Sam Wilson sam@samwilson.id.au:
I wonder if, rather than creating a new IA item, we should just link the original IA item to the DjVu on Commons (via a review)? Or is there a discoverability benefit to be had by having the DjVu also on IA?
On Tue, 3 Jan 2017, at 07:07 AM, Sam Wilson wrote:
Good idea. I guess it's not ideal to end up with two items, but at least the 2nd will be updateable from our end.
It looks like we can add HTML links to IA reviews too, which is nice: https://archive.org/details/spinoza_etica_paravia
On Mon, 2 Jan 2017, at 11:52 PM, Alex Brollo wrote:
Done :-)
Alex
2017-01-02 16:49 GMT+01:00 Alex Brollo alex.brollo@gmail.com:
Please take a look to https://archive.org/details /spinoza_etica_paravia_djvu, this is precisely a djvu-only item that I uploaded some days ago. I asked for permission to create "djvu-only items" into IA forum and I got it; this is the fiirst item I created; as you see there's some "implicit convention" too (the name of item is the original one + a _djvu suffix: it has been derived from https://archive.org/details/spinoza_etica_paravia) and metadata are the same, but a standard warning "Derived from files into L'Etica https://archive.org/details/spinoza_etica_paravia" into the description field.
So far I did not do the last step, t.i. adding a "backlink" from original item to the derived one.
internetarchive.py allows to automatize the whole work (to download metadata of source item, to build the new item name and to add the warning do description field and to upload the new item).
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l