Hi everyone,
please let me revive this thread.
There is an ongoing discussion on it.source about the new Internet Archive
policy, because this is becoming a *quality problem* for the community.
You can see for yourself, here:
this is a detail[1] from a pdf[2] taken from Archive
this is the detail[3] from a djvu (handmade by the user Alex)
Please look at the pictures to understand the problem :-)
The compression of the IA pdf is unfortunately too high, and also the OCR
is not that good.
We can't probably ask IA to change its mind and redo djvus, there are other
more technical ways.
But I'd like this to be a problem to be solved together, maybe directly
into the magnificent "IA Upload" tool.
Wikisource prides itself with quality, so it's right to demand good scans.
What I fear is that bigger communities will have expert users that will
make their own djvus,
and smaller ones that will have to keep IA uploaded PDFs...
Do you have any solutions? Is your community worried about this?
Thanks
Aubrey
[1]
https://it.wikisource.org/wiki/File:Tarchetti_pdf.png
[2]
https://commons.wikimedia.org/w/index.php?title=File%3ATarchetti_-_Paolina.…
[3]
https://it.wikisource.org/wiki/File:Tarchetti_pdf.png
On Mon, Apr 18, 2016 at 3:12 PM, Alex Brollo <alex.brollo(a)gmail.com> wrote:
Can someone "ping" Phe & Tpt into this
talk?
Alex
2016-04-18 10:51 GMT+02:00 Andrea Zanni <zanni.andrea84(a)gmail.com>om>:
I think that the crucial issue here is: will the
ia-upload tool run?
https://tools.wmflabs.org/ia-upload/commons/init
Aubrey
On Fri, Apr 15, 2016 at 8:29 PM, Alex Brollo <alex.brollo(a)gmail.com>
wrote:
Again, just to explain: pdftodjvu output of a IA
pdf is a perfect djvu,
with its regular OCR mapped layer, so nothing changes but the need of
running a very simple command:
pdf2djvu namefile.pdf -o namefile.djvu
Alex
2016-04-15 10:01 GMT+02:00 Andrea Zanni <zanni.andrea84(a)gmail.com>om>:
Yes, this is why I cited it: if we can manage to
use it for Wikisource
importing, we could be safe :-)
Aubrey
On Fri, Apr 15, 2016 at 9:41 AM, Federico Leva (Nemo) <
nemowiki(a)gmail.com> wrote:
> Andrea Zanni, 15/04/2016 09:03:
>
>> I remember Alex Brollo was working with the djvu_xml layer
>>
>
> The XML output from ABBYY is still being published, AFAIK.
>
>
> Nemo
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l