You can be right - my tests presently have been done on one book only. As soon as a python tool to get djvu from _jp2 will run with no human effort, I'll try it on lots of books to get some "general rule".

But - can you confirm that IA viewer shows jpg images coming from jp2-jpg folder?

Another problem, when using original IA pdf (again, I tested it on one book only: see https://it.wikisource.org/wiki/Indice:Tarchetti_-_Paolina.pdf ) is, that OCR text retrieved by mediawiki software is horrible in structure, please try to create any page of that Index. With pdftotext (xpdf) too, results are far from good.

Alex

2016-05-13 11:20 GMT+02:00 Federico Leva (Nemo) <nemowiki@gmail.com>:

Alex Brollo, 13/05/2016 11:06:

Simply, from a practital point iof view, my suggestion is: don't try to
get a good djvu from IA pdf, use instead _jp2.zip images (after
conversion to jpg the images are very good), and the result will be much
better - almost as good as images into IA viewer, that uses the same
images.

In my experience, when there are problems, usually the JP2 images are either too little compressed or too compressed. This has precise reasons and no trivial solution: http://www.digitizationguidelines.gov/still-image/documents/JP2LossyCompression.pdf

Nemo

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l