Hi everyone,
please let me revive this thread.
There is an ongoing discussion on it.source about the new Internet Archive policy, because this is becoming a *quality problem* for the community.
You can see for yourself, here:
this is a detail[1] from a pdf[2] taken from Archive
this is the detail[3] from a djvu (handmade by the user Alex)

Please look at the pictures to understand the problem :-)

The compression of the IA pdf is unfortunately too high, and also the OCR is not that good.

We can't probably ask IA to change its mind and redo djvus, there are other more technical ways.
But I'd like this to be a problem to be solved together, maybe directly into the magnificent "IA Upload" tool.
Wikisource prides itself with quality, so it's right to demand good scans.
What I fear is that bigger communities will have expert users that will make their own djvus,
and smaller ones that will have to keep IA uploaded PDFs...

Do you have any solutions? Is your community worried about this?

Thanks

Aubrey


[1] https://it.wikisource.org/wiki/File:Tarchetti_pdf.png
[2] https://commons.wikimedia.org/w/index.php?title=File%3ATarchetti_-_Paolina.pdf&page=4
[3] https://it.wikisource.org/wiki/File:Tarchetti_pdf.png


On Mon, Apr 18, 2016 at 3:12 PM, Alex Brollo <alex.brollo@gmail.com> wrote:
Can someone "ping" Phe & Tpt into this talk? 

Alex

2016-04-18 10:51 GMT+02:00 Andrea Zanni <zanni.andrea84@gmail.com>:
I think that the crucial issue here is: will the ia-upload tool run?
https://tools.wmflabs.org/ia-upload/commons/init

Aubrey


On Fri, Apr 15, 2016 at 8:29 PM, Alex Brollo <alex.brollo@gmail.com> wrote:
Again, just to explain: pdftodjvu output of a IA pdf is a perfect djvu, with its regular OCR mapped layer, so nothing changes but the need of running a very simple command: 

pdf2djvu namefile.pdf -o namefile.djvu

Alex





2016-04-15 10:01 GMT+02:00 Andrea Zanni <zanni.andrea84@gmail.com>:
Yes, this is why I cited it: if we can manage to use it for Wikisource importing, we could be safe :-)

Aubrey

On Fri, Apr 15, 2016 at 9:41 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Andrea Zanni, 15/04/2016 09:03:
I remember Alex Brollo was working with the djvu_xml layer

The XML output from ABBYY is still being published, AFAIK.


Nemo

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l



_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l



_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l



_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l