[Commons-l] PDF format issues Was: How non-free is Flash?

Gregory Maxwell gmaxwell at gmail.com
Tue Mar 6 14:39:00 UTC 2007


On 3/6/07, David Gerard <dgerard at gmail.com> wrote:
> In a related question, how free is PDF? The format may not be altered
> and still called "PDF", but everyone including the FSF can live with
> that. There's plenty of non-Adobe PDF creators and readers. What there
> is a lack of is editors, but it's essentially a write-once format in
> any case.

PDF is a mystery meat format. You never know exactly whats going to be
inside, so you can't speak about PDF as a whole coherently.

There is a subset of PDF which is free by almost any reasonable
standard (as you note, most people don't regard the name control issue
to be material).

Subset PDF is effectively equal to gzipped postscript. It's been
around forever, and shouldn't have any serious problems. With subset
PDF I think the biggest freedom related risk is non-free authoring
tools smuggling in non-free content without the authors knowledge. (We
see this with SVG too, Illustrator loves to embed non-free fully
hinted TTF fonts in the SVGs... At least they are easy to remove since
our rasterizer ignores them anyways)

However, PDF can be a *very* non-free format, and I would be surprised
if some of the PDF's we host were not pretty much maximally non-free.

First there is the patents issue. Modern commercial PDF plugins
contain a JBIG2, and a Jpeg2k codec. Some of the open-source PDF
readers can read these, but not all. JBIG2 clearly requires a patent,
Jpeg2k is just an ambiguous ugly patent mess.  There may be a other
ways that non-subset PDF is a patent minefield, this is just the first
that comes to mind.

Non-subset PDF also includes encryption and digital restrictions
management. The free software PDF tools includes support for the basic
(old) PDF encryption and can be easily modified to ignore the
restrictions. However, using such modified tools is illegal in the US
as is distributing them, under the DMCA.

There are also a TON of features in non-subset PDF that free tools
don't support (acrobat forms), or which present other accessibility
and security related concerns.

Like I pointed out with flash, to really talk about these things we
need to know the application.

For imaged documents especially high resolution bitonal (black/white)
documents, DjVu is a compelling alternative which is gaining adoption
even outside of the world of free software. We have great support for
DjVu in mediawiki, including on the fly transcoding which should
dramatically reduce concerns related to client compatibility.

http://commons.wikimedia.org/w/index.php?title=Image%3AMozart_Sonate_%28manuscript%29.djvu&page=10

Play with the page controls on the right side. :)

For vector illustrations, SVG is a much better choice.

But for PDF as a compatible version of gzipped PS is still a reasonable tool.
We need better ability to check files for evilness. Ideally we could
have a client side upload tool that knew how to transcode things into
acceptable formats (using codecs built into your OS) and which could
test for problems like the one I discussed for PDF.



More information about the Commons-l mailing list