Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
This should be an FAQ but, but it isn't obvious from http://commons.wikimedia.org/wiki/Commons:File_types
Lars Aronsson schrieb:
Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
The reasons are of a practical nature. ODT files are hard(er) to varify, because they are just zip files that could contain anything. Also, they can contain things like java applets, which are potentially dangerous. I think these are the resons for not allowing office-style formats. Someone correct me if i'm wrong.
-- daniel
On Sun, Nov 16, 2008 at 3:51 PM, Daniel Kinzler daniel@brightbyte.de wrote:
Lars Aronsson schrieb:
Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
The reasons are of a practical nature. ODT files are hard(er) to varify, because they are just zip files that could contain anything. Also, they can contain things like java applets, which are potentially dangerous. I think these are the resons for not allowing office-style formats. Someone correct me if i'm wrong.
I was just about to post the same thing, Damiel beat me because I took the time to test the current functionality.
We permitted SXD (openoffice 1.x) files until fairly recently when the new Java/zip exploit came to light, up until that point the newer ODT files were just not supported by omission. Now we've been reminded of the problems of zip formats, so we're currently denying both.
If someone creates a good sanitizer that only allows normal ODT files without the risk of smuggling hidden program code, then we could allow the OpenOffice files again. I believe it would be desirable to do so, as rejecting the editable form is highly undesirable.
Although, we strongly prefer wikitext for text, so there ought to be a very good reason for using ODT *or* PDF. Most uploaded PDFs are deleted and http://commons.wikimedia.org/wiki/Commons:Scope has some to say about the matter.
I'll go update the formats page to reflect this information.
2008/11/16 Gregory Maxwell gmaxwell@gmail.com:
If someone creates a good sanitizer that only allows normal ODT files without the risk of smuggling hidden program code, then we could allow the OpenOffice files again. I believe it would be desirable to do so, as rejecting the editable form is highly undesirable.
But still zip file could have decompression bomb or is there any universal method of avoiding that?
AJF/WarX
On Sun, Nov 16, 2008 at 4:13 PM, Artur Fijałkowski wiki.warx@gmail.com wrote:
2008/11/16 Gregory Maxwell gmaxwell@gmail.com:
If someone creates a good sanitizer that only allows normal ODT files without the risk of smuggling hidden program code, then we could allow the OpenOffice files again. I believe it would be desirable to do so, as rejecting the editable form is highly undesirable.
But still zip file could have decompression bomb or is there any universal method of avoiding that?
Disallow recursive zips (not needed for any of these formats), and check the directory before uncompressing, disallowing anything that decompresses to enormous sizes. The combination should be sufficient for that particular issue.
Gregory Maxwell wrote:
Although, we strongly prefer wikitext for text,
I don't use PDF for text, but for text with layout. And then I cannot supply the preferred editbale source format.
My Wikipedia Academy exercises are already available in editable (and translatable) text on meta, http://meta.wikimedia.org/wiki/User:LA2/%C3%96vningsuppgifter
but the layouted and printable PDF cannot be immediately derived from that, http://commons.wikimedia.org/wiki/Image:Wikipedia_Academy_2008_Sweden_Exerci...
There is an ODT document in between, that I currently cannot upload. So I provide the printable PDF without the preferred editable format, thus violating the GNU freedoms of the user.
This is the case *everytime* I upload a PDF.
On Sun, Nov 16, 2008 at 4:42 PM, Lars Aronsson lars@aronsson.se wrote:
but the layouted and printable PDF cannot be immediately derived from that, http://commons.wikimedia.org/wiki/Image:Wikipedia_Academy_2008_Sweden_Exerci...
What layout there could you not accomplish via Wikitext?
Even allowing ODT its use really hurts editability, since we lose many of revision control features of mediawiki for these documents and require users to have a particular software package installed rather than just using the website.
Gregory Maxwell wrote:
On Sun, Nov 16, 2008 at 4:42 PM, Lars Aronsson lars@aronsson.se wrote:
http://commons.wikimedia.org/wiki/Image:Wikipedia_Academy_2008_Sweden_Exerci...
What layout there could you not accomplish via Wikitext?
In this simple case, ensuring that every exercise fits on one page of size A5 (half page of size A4). In more advanced brochure examples, the nice and readable layout.
The fact is, I have an ODT file. This will not change. The question is, can Wikimedia Commons handle this or do I have to start a separate project?
Even allowing ODT its use really hurts editability, since we lose many of revision control features of mediawiki for these documents and require users to have a particular software package installed rather than just using the website.
It is actually no different from JPEG images. You can edit them with Photoshop or GIMP or any other program. And the Open Document Format is open so it can be edited with any software, including OpenOffice.org or Microsoft Office.
The main difference is that the PDF that I can upload, cannot easily be edited.
Lars Aronsson wrote:
Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
This should be an FAQ but, but it isn't obvious from http://commons.wikimedia.org/wiki/Commons:File_types
As ODT are ZIP files, it's hard to allow one kind of Zip based format while rejecting the others. It's not just to avoid people sharing all kind of files uploading them in zip, jar files (also zip-based) are a security risk. (and you could have a file being both OpenDocument and Jar!)
If you come up with a solution, we'll be happy to hear about it.
Platonides wrote:
If you come up with a solution, we'll be happy to hear about it.
Surely OpenOffice.org (Sun Microsystems) must be able to produce a validator in open source code?! Has this been asked of them? Why else do we try to use open formats?
Or freely allow uploads but put these suspected objects in quaratine, that requires another trusted user to download with care and examine before the file is flagged as OK.
Or would RTF work?
Lars Aronsson wrote:
Platonides wrote:
If you come up with a solution, we'll be happy to hear about it.
Surely OpenOffice.org (Sun Microsystems) must be able to produce a validator in open source code?! Has this been asked of them? Why else do we try to use open formats?
Not that I am aware.
Or freely allow uploads but put these suspected objects in quaratine, that requires another trusted user to download with care and examine before the file is flagged as OK.
That breaks the wiki principle, and it would need special support from the software. Better to allow trusted users to bypass filetype checks and set some process for sending them for upload (after review) those files that "must be uploaded in X format".
Or would RTF work?
Those are much easier to validate :) At least they won't steal cookies. Still, we would need to check they don't have embedded objects.
(Originally, this was a thread on whether to enable ODF uploads on Commons. See [http://mail-archive.decenturl.com/wikitech-l-open-document-format].)
Lars Aronsson wrote:
Platonides wrote:
If you come up with a solution, we'll be happy to hear about it.
Surely OpenOffice.org (Sun Microsystems) must be able to produce a validator in open source code?! Has this been asked of them? Why else do we try to use open formats?
Last Saturday I was at the yearly meeting of Serbian OpenOffice.org community, so I asked people about this. They tell me that something like this is indeed produced, and is called http://odftoolkit.org/ . By using this toolkit it should be possible to validate ODF documents, make sure there are no "extra" files in their archives, and even to modify documents so that f.e. macros and scripts are removed from them. It doesn't require OOo to be installed. So if anyone is interested in validating ODF files so that their upload could be enabled on Commons, this would be the way to go.
On Sun, Nov 16, 2008 at 10:47 PM, Lars Aronsson lars@aronsson.se wrote:
Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
This should be an FAQ but, but it isn't obvious from http://commons.wikimedia.org/wiki/Commons:File_types
I just uploaded a hybrid pdf and it is working properly. you could use that until ODFs are allowed.
http://www.oooninja.com/2008/06/pdf-import-hybrid-odf-pdfs-extension-30.html
Mohamed Magdy wrote:
I just uploaded a hybrid pdf and it is working properly. you could use that until ODFs are allowed.
http://www.oooninja.com/2008/06/pdf-import-hybrid-odf-pdfs-extension-30.html
You're taking advantage of a hole on our file type detection system: We aren't checking for files embedded into PDF streams.
I had set up a check, but forgot to properly enable it :/ It's now fixed, so Wiki-Bot will now warn about embedded ZIP based files. Using it for hybrid pdf-odf is useful, but has also been used for abuse in the past.
On Mon, Nov 17, 2008 at 4:31 AM, Mohamed Magdy mohamed.m.k@gmail.com wrote:
On Sun, Nov 16, 2008 at 10:47 PM, Lars Aronsson lars@aronsson.se wrote:
Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
This should be an FAQ but, but it isn't obvious from http://commons.wikimedia.org/wiki/Commons:File_types
I just uploaded a hybrid pdf and it is working properly. you could use that until ODFs are allowed.
http://www.oooninja.com/2008/06/pdf-import-hybrid-odf-pdfs-extension-30.html
Ugh. Please do not rely on this. The embedded zip in PDF has been abused to smuggle illegal material into PDFs on commons. We don't block these files right now, but we may someday, and even without blocking they are likely to be confused for something sneaky and get deleted.
Once we have proper ODF screening we could possibly allow just those in PDF, but then again, we could allow them directly (which I think would be preferable).
On Mon, Nov 17, 2008 at 3:43 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
Once we have proper ODF screening we could possibly allow just those in PDF, but then again, we could allow them directly (which I think would be preferable).
Meh. I think ODF-in-PDF would be preferrable. The penetration of Adobe reader is really high, which means that almost anybody can view the document. Are there any downsides of ODF embedded in PDF compared to pure ODF except file size?
Bryan
On Mon, Nov 17, 2008 at 9:48 AM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Mon, Nov 17, 2008 at 3:43 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
Once we have proper ODF screening we could possibly allow just those in PDF, but then again, we could allow them directly (which I think would be preferable).
Meh. I think ODF-in-PDF would be preferable. The penetration of Adobe reader is really high, which means that almost anybody can view the document. Are there any downsides of ODF embedded in PDF compared to pure ODF except file size?
The resulting file is not part of the ISO covered part of PDF. Many PDF viewers have no clue how to extract the embedded file. On the site the distinct revision history of the files is lost (i.e. someone could revise the ODF or the PPT and not the other, and you couldn't see that on the site). And, of course size... ODF documents can potentially be very large. Despite those arguments against it, always having the document source in an editable form is, admittedly, very attractive. I don't think that it's worth having a big debate over files which could only be permitted in a fairly sparing manner. In any case, document validation is a pre-req in any case.
On Mon, Nov 17, 2008 at 4:12 AM, Lars Aronsson lars@aronsson.se wrote: [snip]
The fact is, I have an ODT file. This will not change. The question is, can Wikimedia Commons handle this or do I have to start a separate project?
This has already veered off-topic for this list…
We operate Wikis here today, our projects are collaborative, and wiki is our only robust collaborative technology. While I do not want to tell you "no" as there are a lot of useful reasons to accept OpenOffice uploads, there are also many reasons why we're fairly restrictive about non-wikitext formatted-text uploads. There is a desire to accommodate real needs— but it needs to be balanced against our desire to maximize accessibility and the ability to collaborate.
If wikitext has layout shortcomings which are impacting you, then you should disclose them so that people can think about how to resolve them. But if your position is "I just want to do it my way", then perhaps you should be starting a separate project.
Gregory Maxwell wrote:
If wikitext has layout shortcomings which are impacting you, then you should disclose them so that people can think about how to resolve them. But if your position is "I just want to do it my way", then perhaps you should be starting a separate project.
I do outreach projects, where I need to print things on paper, and they need to look (somewhat) pretty. I'm not in the business of developing entirely new wordprocessing or typesetting software. When I take photos, I use a commercially available camera. It's not realistic for me to take five years off to develop my own camera. Both tools, OpenOffice and my Canon camera, deliver editable source files in open formats. Wikimedia Commons accepts the JPEG photos, but not the Open Document files.
I'm continuing my work, and I will upload PDF files to Wikimedia Commons, but I will keep the editable source files (ODT) offline.
Wikimedia Commons is already full of posters and brochures in PDF, where the editable source files are unavailable. That is sad.
I'm continuing my work, and I will upload PDF files to Wikimedia Commons, but I will keep the editable source files (ODT) offline.
Wikimedia Commons is already full of posters and brochures in PDF, where the editable source files are unavailable. That is sad.
Yes, sad. I think everysone agrees. But not trivial to fix. So, help with fixing it, or wait for someone else to fix it. Complaining doesn't help. Such is the nature of open source projects.
-- daniel
On Tue, Nov 18, 2008 at 12:52 PM, Daniel Kinzler daniel@brightbyte.de wrote:
I'm continuing my work, and I will upload PDF files to Wikimedia Commons, but I will keep the editable source files (ODT) offline.
Wikimedia Commons is already full of posters and brochures in PDF, where the editable source files are unavailable. That is sad.
Yes, sad. I think everysone agrees. But not trivial to fix. So, help with fixing it, or wait for someone else to fix it. Complaining doesn't help. Such is the nature of open source projects.
There are other repository for free culture, www.archiver.org, that could be another option for people. Posting the PDF in Wikipedia, and the ODT or archive.org. Bonus points adding a comment to the WP page of the pdf with a link to the archive.org src file.
This is another way to pimp free culture. Wikipedia is popular, widly popular, sharing that popularity with another good agents can result on good things. For hence, one day the Wikipedia will die. One basket, all eggs. A bus can hit the wikipedia.
On Tue, Nov 18, 2008 at 7:52 AM, Daniel Kinzler daniel@brightbyte.dewrote:
I'm continuing my work, and I will upload PDF files to Wikimedia Commons, but I will keep the editable source files (ODT) offline.
Wikimedia Commons is already full of posters and brochures in PDF, where the editable source files are unavailable. That is sad.
Yes, sad. I think everysone agrees. But not trivial to fix. So, help with fixing it, or wait for someone else to fix it. Complaining doesn't help. Such is the nature of open source projects.
Sometimes complaining does help, though. And such *is* the nature of open source projects - "given enough eyeballs, all bugs are shallow" is not a statement that "given enough coders, all bugs are fixable".
Code is better than complaints, but complaints are better than nothing.
On Mon, May 18, 2009 at 9:42 AM, Anthony wikimail@inbox.org wrote:
On Tue, Nov 18, 2008 at 7:52 AM, Daniel Kinzler daniel@brightbyte.dewrote:
I'm continuing my work, and I will upload PDF files to Wikimedia Commons, but I will keep the editable source files (ODT) offline.
Wikimedia Commons is already full of posters and brochures in PDF, where the editable source files are unavailable. That is sad.
Yes, sad. I think everysone agrees. But not trivial to fix. So, help with fixing it, or wait for someone else to fix it. Complaining doesn't help. Such is the nature of open source projects.
Sometimes complaining does help, though. And such *is* the nature of open source projects - "given enough eyeballs, all bugs are shallow" is not a statement that "given enough coders, all bugs are fixable".
Code is better than complaints, but complaints are better than nothing.
Oops missed the date, sorry.
On Tue, Nov 18, 2008 at 4:34 AM, Lars Aronsson lars@aronsson.se wrote:
I do outreach projects, where I need to print things on paper, and they need to look (somewhat) pretty. I'm not in the business of developing entirely new wordprocessing or typesetting software.
Who suggested you do so?
I was asking you to express your needs in a clear and actionable manner so that *someone* might address them. I looked at the document you provided and with the exception of pagination control I didn't see any obvious layout elements that Wikitext does not provide. I have no doubt that they exist, but I'm missing them. Improvements to Wikitext would be beneficial to many people.
When I take photos, I use a commercially available camera. It's not realistic for me to take five years off to develop my own camera. Both tools, OpenOffice and my Canon camera, deliver editable source files in open formats. Wikimedia Commons accepts the JPEG photos, but not the Open Document files. I'm continuing my work, and I will upload PDF files to Wikimedia Commons, but I will keep the editable source files (ODT) offline.
The current reasons for not allowing these formats has already been explained by multiple people, they have nothing to do with demanding you to make your own camera.
Wikimedia Commons is already full of posters and brochures in PDF, where the editable source files are unavailable. That is sad.
It's mostly not supposed to be, except for some forms of source material. Full of is relative: Out of 3.5 million files ones in that class are fairly infrequent... and posters are not equivalent to text documents. Commons does not generally permit uploads of original texts which could otherwise be wiki-text, though this is not why ODT is not currently allowed. (http://commons.wikimedia.org/wiki/Commons:Scope#Allowable_reasons_for_PDF_an...)
On Mon, Nov 17, 2008 at 4:43 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
The embedded zip in PDF has been abused to smuggle illegal material into PDFs on commons.
We don't protect (or disable editing by IPs) *all* the articles because someone abused *some* of them and kept vandalizing.
On Mon, Nov 17, 2008 at 12:41 PM, Mohamed Magdy mohamed.m.k@gmail.com wrote:
We don't protect (or disable editing by IPs) *all* the articles because someone abused *some* of them and kept vandalizing.
IPs vandalizing don't take over your accounts and wreak havoc on the sites. Java in Zip has that potential.
IPs vandalizing to not result in converting commons into a warez repository will with illicitly copied games, system cracking tools, and other undesirable content being distributed undetected for months/years at a time. Zip in PDF and rar appended PDF have resulted in this.
There is a transparency to Wiki which makes open wiki's viable. Text-in-upload lacks that transparency and that leads to problems. We need to strike a good balance. Unscrubbed PDF and ZIP uploads are not an especially good balance.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Sun, Nov 16, 2008 at 10:47 PM, Lars Aronsson lars@aronsson.se wrote:
Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle.
ODT and other ZIP-based types are currently disabled on the public sites until we've nailed down ZIP-based file security a bit better. (It's enabled on the private sites; we have a basic file type check to confirm that the file really thinks its an ODF of the appropriate extension, but not yet checks to confirm there's not evil Java classes also sitting in the ZIP etc.)
If someone would like to do some work on that, that'd be super -- we haven't really been prioritizing it yet as there's not a lot of call for ODF files outside the private working-group wikis.
There's an optional zip extension for PHP which should include support for listing out the ZIP file directory; however since this isn't included in PHP by default it might be nice to be able to read the directory independently without the extension for general MediaWiki installs. (It shouldn't be necessary to actually decompress anything for our purposes here -- we're mainly looking for subfiles not expected in an ODF, particularly Java classes that could be used for a session attack.)
Mohamed Magdy wrote:
I just uploaded a hybrid pdf and it is working properly. you could use that until ODFs are allowed.
http://www.oooninja.com/2008/06/pdf-import-hybrid-odf-pdfs-extension-30.html
I'd tend to recommend against that; if the edit updating isn't transparent you're likely to have things get out of sync, or at least just be kind of confusing.
On the other hand, it is nice to be able to have a PDF form for people to download and print without needing to install the behemoth that is OpenOffice. :)
- -- brion
wikitech-l@lists.wikimedia.org