You can be right - my tests presently have been done on one book only. As soon as a python tool to get djvu from _jp2 will run with no human effort, I'll try it on lots of books to get some "general rule". 

But - can you confirm that IA viewer shows jpg images coming from jp2-jpg folder? 

Another problem, when using original IA pdf (again, I tested it on one book only: see ) is, that OCR text retrieved by mediawiki software is horrible in structure, please try to create any page of that Index. With pdftotext (xpdf) too, results are far from good. 



2016-05-13 11:20 GMT+02:00 Federico Leva (Nemo) <>:
Alex Brollo, 13/05/2016 11:06:
Simply, from a practital point iof view, my suggestion is: don't try to
get a good djvu from IA pdf, use instead images (after
conversion to jpg the images are very good), and the result will be much
better - almost as good as images into IA viewer, that uses the same

In my experience, when there are problems, usually the JP2 images are either too little compressed or too compressed. This has precise reasons and no trivial solution:


Wikisource-l mailing list