http://www.infoworld.com/article/09/02/20/Adobe_flaw_heightens_risk_of_encou...
Do we sanitise PDFs at all? Do we check for wacky "active" features in a PDF?
- d.
I pointed this out at VP a few months ago when it was proposed that we virus-scanned incoming files - as far as I am aware, nothing is checked when uploading.
Could be wrong, but that's how I remember the conversation going.
- Chris
On Fri, Feb 20, 2009 at 4:24 PM, David Gerard dgerard@gmail.com wrote:
http://www.infoworld.com/article/09/02/20/Adobe_flaw_heightens_risk_of_encou...
Do we sanitise PDFs at all? Do we check for wacky "active" features in a PDF?
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Chris Down wrote:
I pointed this out at VP a few months ago when it was proposed that we virus-scanned incoming files - as far as I am aware, nothing is checked when uploading.
Could be wrong, but that's how I remember the conversation going.
- Chris
The file type is scanned. Also, I run a bot doing stricter checks on the file contents for all commons uploads (could extend to other projects if you want).
It could also pass a virus scan but I don't think it's really needed. Virus scanners mainly look for known bad code, inside executables. We don't want any kind of executable.
On Fri, Feb 20, 2009 at 4:24 PM, David Gerard dgerard@gmail.com wrote:
http://www.infoworld.com/article/09/02/20/Adobe_flaw_heightens_risk_of_encou...
Do we sanitise PDFs at all? Do we check for wacky "active" features in a PDF?
- d.
It isn't too specific, so would be hard to detect. What we could do is to reject pdfs containing javascript. An unneeded feature IMHO. It has been used more as attack vector than legitimately. Do you know of a tool which could detect that? I don't think pdfinfo provides that.
In any case, pdfs don't stay too much. They are a headache for a different reason. About 99% pdf uploads really shouldn't have been uploaded as pdf.
2009/2/20 Platonides Platonides@gmail.com:
What we could do is to reject pdfs containing javascript. An unneeded feature IMHO. It has been used more as attack vector than legitimately. Do you know of a tool which could detect that? I don't think pdfinfo provides that.
Would pdf2ps -> ps2pdf do it?
- d.
On Fri, Feb 20, 2009 at 10:03 AM, David Gerard dgerard@gmail.com wrote:
2009/2/20 Platonides Platonides@gmail.com:
What we could do is to reject pdfs containing javascript. An unneeded feature IMHO. It has been used more as attack vector than legitimately. Do you know of a tool which could detect that? I don't think pdfinfo provides that.
Would pdf2ps -> ps2pdf do it?
If such a round-trip would suppress this (and I have no idea), the next question would be whether it would suppress or reduce in quality any of the other content that we actually do care about. In general such processes often have unintended consequences that make them undesirable.
-Robert Rohde
On Fri, Feb 20, 2009 at 1:59 PM, Robert Rohde rarohde@gmail.com wrote:
On Fri, Feb 20, 2009 at 10:03 AM, David Gerard dgerard@gmail.com wrote:
2009/2/20 Platonides Platonides@gmail.com:
What we could do is to reject pdfs containing javascript. An unneeded feature IMHO. It has been used more as attack vector than legitimately. Do you know of a tool which could detect that? I don't think pdfinfo provides that.
Would pdf2ps -> ps2pdf do it?
If such a round-trip would suppress this (and I have no idea), the next question would be whether it would suppress or reduce in quality any of the other content that we actually do care about. In general such processes often have unintended consequences that make them undesirable.
-Robert Rohde
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Not to mention the processing overhead. How efficient is pdf2ps/ps2pdf? Are we going to slow down PDF upload/display by trying to sanitize it first?
-Chad
On Fri, Feb 20, 2009 at 12:57 PM, Platonides Platonides@gmail.com wrote: [snip]
It could also pass a virus scan but I don't think it's really needed. Virus scanners mainly look for known bad code, inside executables. We don't want any kind of executable.
I've run clamav against the entire set of files in the past. Found a couple of interesting things (like, 3 files out of millions).
Converting pdftops and back will probably totally kill the text layer. Might as well render to images and djvu.
wikitech-l@lists.wikimedia.org