Nemo, try to do an "autopsy" of cited IA pdf by pdfimages (xpdf) that recovers raw pdf images into its pages. You'll find that pages are exotically segmented into a full color background, a strange image, and an inverted image of thresholded image (I presume, used as a mask). Just negating the last one, you can get a decent, light BW image of the page. I could build from the last one a decent BW djvu image: , but it.source users didn't like the idea

I presume that this complex structure is somewhat similar of djvu background/foreground segmentation into djvu files, and artifacts are similar.

So, pdf images are not only "compressed", but deeply processed and segmented images. 

Anyway: IA image viewer doesn't use at all pdf (nor djvu) but uses jpg from jp2 files; so, if you need a djvu similar, for details, to what you see into the IA viewer, you have to download and process jp2 images to build a decent djvu file. 

Is something of this complex IA image processing path documented anywhere? I got my conclusions simply by "try and learn" from IA  file "necropsy". 


2016-05-12 20:10 GMT+02:00 Federico Leva (Nemo) <>:
Andrea Zanni, 12/05/2016 19:38:

That was meant to be

I don't think this has anything to do with DjVu or PDF, the problem is very clear just by looking at : the JP2 conversion compressed the images 30 times, the PDF compression 5 more times.

The first step in such cases, as documented in , is to add/increase the fixed-ppi field. I don't understand what was used in


Wikisource-l mailing list