There's also this new Phab task, that's looking at a more limited
first-step:
Investigation: Could we build a Tool Labs project to generate Djvu files for WikiSource
https://phabricator.wikimedia.org/T154538
On Tue, 3 Jan 2017, at 07:46 AM, Alex Brollo wrote:
You can see a great advantage of djvu files over pdf
files into the
present file list of any IA item. You can see that IA removed djvu
files, but it builds and publishes _djvu.xml file. Why? I presume
that IA uses that file to "map words" into its book viewer, since it
has a good text structure while being *pretty simple*. It can be
translated into hOCR, and editing its text nodes the edited text can
be uploaded again into the djvu file. Itsource is testing, on some
texts, tricks to mass-fix djvu text layer (removing scannos etc.)
*before* uploading it into Commons.
It's a pity IMHO that this magic book format has
been disregarded. Its
structure is *open* just as the pdf structure is *closed*.
Alex
2017-01-03 0:19 GMT+01:00 Sam Wilson
<sam(a)samwilson.id.au>au>:
> __
>> I wonder if, rather than creating a new IA item, we should just
>> link the original IA item to the DjVu on Commons (via a review)? Or
>> is there a discoverability benefit to be had by having the DjVu
>> also on IA?
>
>
> On Tue, 3 Jan 2017, at 07:07 AM, Sam Wilson wrote:
>>> Good idea. I guess it's not ideal to end up with two items, but at
>>> least the 2nd will be updateable from our end.
>>
>>> It looks like we can add HTML links to IA reviews too, which is
>>> nice:
https://archive.org/details/spinoza_etica_paravia
>>
>>
>> On Mon, 2 Jan 2017, at 11:52 PM, Alex Brollo
wrote:
>>> Done :-)
>>>
>>>
Alex
>>>
>>> 2017-01-02 16:49 GMT+01:00 Alex Brollo
<alex.brollo(a)gmail.com>om>:
>>>>> Please take a look to
>>>>>
https://archive.org/details/spinoza_etica_paravia_djvu, this is
>>>>> precisely a djvu-only item that I uploaded some days ago. I asked
>>>>> for permission to create "djvu-only items" into IA forum
and I got
>>>>> it; this is the fiirst item I created; as you see there's some
>>>>> "implicit convention" too (the name of item is the original
one +
>>>>> a _djvu suffix: it has been derived from
>>>>>
https://archive.org/details/spinoza_etica_paravia) and metadata
>>>>> are the same, but a standard warning "Derived from files into
>>>>> L'Etica[1]" into the description field.
>>>>
>>>>> So far I did not do the last step, t.i. adding a "backlink"
from
>>>>> original item to the derived one.
>>>>
>>>>> internetarchive.py allows to automatize the whole work (to
>>>>> download metadata of source item, to build the new item name and
>>>>> to add the warning do description field and to upload the new
>>>>> item).
>>>>
>>>>
>>
>>
>>
_________________________________________________
>> Wikisource-l mailing list
>> Wikisource-l(a)lists.wikimedia.org
>>
>
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-l(a)lists.wikimedia.org
>
_________________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
Links:
1.
https://archive.org/details/spinoza_etica_paravia