I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
1. Is there any reason TIFF is off the allowed media types list? 2. Is there any way for the software to say "no" earlier in the process of making a huge upload?
- d.
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
Tiff doesn't offer much which isn't supported equally or better by another format we accept.
Tiff is a grab-bag format... it can be a lot of different things, including things that we clearly don't want, and a lot of things that a lot of tools don't read.
What is the 12MB tiff actually of? If it's a photograph, it should probably be made into a jpeg. If it's a 1bpp scanned document it should be probably converted into a djvu.
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
We really should have a good upload client which can not only help reject files early, but which could help users convert files, and walk users through the steps of providing all the right metadata. Commonist is almost that, but not quite. :)
On 01/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
Tiff doesn't offer much which isn't supported equally or better by another format we accept. Tiff is a grab-bag format... it can be a lot of different things, including things that we clearly don't want, and a lot of things that a lot of tools don't read.
There is that ... as TIFF is a container format, looking inside the container requires a reasonable chunk of the file.
Their TIFFs appear to be uncompressed. I converted it to PNG, thus making a pixel-identical copy that was half the size :-)
What is the 12MB tiff actually of? If it's a photograph, it should probably be made into a jpeg. If it's a 1bpp scanned document it should be probably converted into a djvu.
It's a photo:
http://commons.wikimedia.org/wiki/Image:Gifford_Pinchot_3c03915u.png
I converted it to PNG so as to have a pixel-identical copy of the LOC original. I'll be cropping and uploading a JPEG (for use in [[:en:Gifford Pinchot]], whose present image is State rather than Federal so of unknown copyright status).
(There are a LOT of nice high-resolution TIFFs on the LOC site, many of which will be US federal government public domain. Presumably saving as PNGs and making a nice JPEG for articles would be a useful thing to do.)
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
We really should have a good upload client which can not only help reject files early, but which could help users convert files, and walk users through the steps of providing all the right metadata. Commonist is almost that, but not quite. :)
Commons will be great soon! %-D
(What's the US patent exposure from having an MPEG-to-Theora converter on Wikimedia servers? Would running one on the toolserver be safe enough?)
- d.
i get "Error creating thumbnail: Invalid thumbnail parameters" for that image but the full size one works.
not sure what's up with that.
mark
On 10/1/07, David Gerard dgerard@gmail.com wrote:
On 01/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
Tiff doesn't offer much which isn't supported equally or better by another format we accept. Tiff is a grab-bag format... it can be a lot of different things, including things that we clearly don't want, and a lot of things that a lot of tools don't read.
There is that ... as TIFF is a container format, looking inside the container requires a reasonable chunk of the file.
Their TIFFs appear to be uncompressed. I converted it to PNG, thus making a pixel-identical copy that was half the size :-)
What is the 12MB tiff actually of? If it's a photograph, it should probably be made into a jpeg. If it's a 1bpp scanned document it should be probably converted into a djvu.
It's a photo:
http://commons.wikimedia.org/wiki/Image:Gifford_Pinchot_3c03915u.png
I converted it to PNG so as to have a pixel-identical copy of the LOC original. I'll be cropping and uploading a JPEG (for use in [[:en:Gifford Pinchot]], whose present image is State rather than Federal so of unknown copyright status).
(There are a LOT of nice high-resolution TIFFs on the LOC site, many of which will be US federal government public domain. Presumably saving as PNGs and making a nice JPEG for articles would be a useful thing to do.)
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
We really should have a good upload client which can not only help reject files early, but which could help users convert files, and walk users through the steps of providing all the right metadata. Commonist is almost that, but not quite. :)
Commons will be great soon! %-D
(What's the US patent exposure from having an MPEG-to-Theora converter on Wikimedia servers? Would running one on the toolserver be safe enough?)
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/1/07, Wikinews Markie newsmarkie@googlemail.com wrote:
i get "Error creating thumbnail: Invalid thumbnail parameters" for that image but the full size one works.
not sure what's up with that.
This is because for system-outage-avoidance-reasons PNGs over 12mpixels will not be thumbnailed. (PNGs are not thumbnailed incrementally and can cause basically unbound memory usage)
Jpg and Djvu are fine however.
David is uploading a JPG versions.
Also, because thumbnails in articles are always in the same format as the orignal the use of a PNG version of this image in articles would be abusive to our users on slower net connections. :)
On 01/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
On 10/1/07, Wikinews Markie newsmarkie@googlemail.com wrote:
i get "Error creating thumbnail: Invalid thumbnail parameters" for that image but the full size one works. not sure what's up with that.
This is because for system-outage-avoidance-reasons PNGs over 12mpixels will not be thumbnailed. (PNGs are not thumbnailed incrementally and can cause basically unbound memory usage) Jpg and Djvu are fine however.
Is there any easy way to make that error message look less drastic? "No thumbnail (PNGs over 12 megapixels on Wikimedia servers are not thumbnailed)" or something?
- d.
David Gerard wrote:
Is there any easy way to make that error message look less drastic? "No thumbnail (PNGs over 12 megapixels on Wikimedia servers are not thumbnailed)" or something?
- d.
That was my first suggestion when it was implemented. Make the message more on the line of "Image is too big", rather than "Wrong parameters" when the user is not (derectly) giving any param.
On 10/1/07, David Gerard dgerard@gmail.com wrote:
(There are a LOT of nice high-resolution TIFFs on the LOC site, many of which will be US federal government public domain. Presumably saving as PNGs and making a nice JPEG for articles would be a useful thing to do.)
The ability to crop and format convert on thumbnail, i.e.:
[[Image:Hawk eating prey.png|crop=900:300x400:900|format=jpg|300px]]
Is a long requested feature. See bug 7757, among others...
Crops are by far the most frequent modification needed for images. There are a lot of cases where we want to upload a lossless or near lossless 'source' version, but then find that we need to upload one or more crop versions for various purposes. :(
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
Probably nobody thought about it. Also thumbnailing might be a problem?
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
We could do a check on upload by javascript. Similarly to the current overwrite-check.
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
Tim Starling might know more about 1). See http://en.wikisource.org/wiki/User:Tim_Starling/ScanSet_TIFF_demo
On 10/1/07, Anthony wikimail@inbox.org wrote:
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
Tim Starling might know more about 1). See http://en.wikisource.org/wiki/User:Tim_Starling/ScanSet_TIFF_demo
FWIW, The djvu is just a ton better for those files:
EB8A200.djvu 82,927 bytes EB8A200.tif 295,912 bytes
A 200kbyte per page is the difference between being able to comfortably read the pages on something less than uber-broadband and not.
IMO the plugins for djvu are much nicer than the tiff plugins out there, and since we already have had thumbnailing for it for a long time the need for plugins is pretty minimal.
Plus Djvu supports OCRed layers so you can copy and paste from imaged documents and such..
The rest of the functionality in the extension is spiffy enough.
Anthony wrote:
On 10/1/07, David Gerard dgerard@gmail.com wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
- Is there any reason TIFF is off the allowed media types list?
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
Tim Starling might know more about 1). See http://en.wikisource.org/wiki/User:Tim_Starling/ScanSet_TIFF_demo
ScanSet is a one-off hack which presents a scan of the 1911 Encyclopedia Britannica, one page per file. It doesn't do on-the-fly conversion to PNG, we just converted the whole image set as a batch process.
Support for uploaded TIFF files would be a very similar problem to the OggHandler extension I recently wrote.
Gregory Maxwell wrote:
FWIW, The djvu is just a ton better for those files:
EB8A200.djvu 82,927 bytes EB8A200.tif 295,912 bytes
Not to mention the fact that we already have DjVu support in MediaWiki. Does anyone have a suggestion for a TIFF to DjVu conversion method?
-- Tim Starling
Tim Starling wrote:
ScanSet is a one-off hack which presents a scan of the 1911 Encyclopedia Britannica, one page per file. It doesn't do on-the-fly conversion to PNG, we just converted the whole image set as a batch process.
Well, i can't see it. Why do i need Quicktime to view a Tif?
Platonides wrote:
Tim Starling wrote:
ScanSet is a one-off hack which presents a scan of the 1911 Encyclopedia Britannica, one page per file. It doesn't do on-the-fly conversion to PNG, we just converted the whole image set as a batch process.
Well, i can't see it. Why do i need Quicktime to view a Tif?
Most browsers don't support inline display of TIFF images.
Sensible behavior to support TIFF uploads would likely involve conversion to JPEG or PNG for inline display.
-- brion vibber (brion @ wikimedia.org)
On 10/2/07, Brion Vibber brion@wikimedia.org wrote:
Most browsers don't support inline display of TIFF images.
May I suggest that TIFF is not really a useful format in the context of web-publishing? The TIFF format is a fairly advanced type of file that if properly used is inclusive of features such as layers and color space information for transformation between different medias such as video, film and paper. The chromatic precision of a tiff can go up from 1bit per pixel to 32 bpp and more if additional layers, sometimes containing non visual information, are added. These are all very useful features in some contexts but I sincerely doubt their usefulness in a context such as the web and mediawiki. JPGs, PNGs and SVGs should cover all the needs for this context. Support should be concentrated on those formats, making sure all their features are properly handled.
My two eurocents.
Ciao!
Manu
On 02/10/2007, Emanuele D'Arrigo manu3d@gmail.com wrote:
On 10/2/07, Brion Vibber brion@wikimedia.org wrote:
Most browsers don't support inline display of TIFF images.
May I suggest that TIFF is not really a useful format in the context of web-publishing?
Well, no, it's not. However, if the best copy of a document is a TIFF, it would be good to be able to store that on Commons, rather than a copy in another format missing something.
- d.
On 02/10/2007, Emanuele D'Arrigo manu3d@gmail.com wrote:
May I suggest that TIFF is not really a useful format in the context of web-publishing?
On 10/2/07, David Gerard dgerard@gmail.com wrote:
Well, no, it's not. However, if the best copy of a document is a TIFF, it would be good to be able to store that on Commons, rather than a copy in another format missing something.
Ok, but when is a TIFF ever better than a PNG in a web-publishing context? When is a PNG "loosing" any information that is useful in such context?
Don't get me wrong: I'd be ok with MW being capable of digesting a TIFF, i.e. converting it to a PNG. But I do not see the use for inline TIFF display and storage, and given the possibility for a TIFF to be uploaded uncompressed, I'd strongly advocate -against- its use and storage "as is".
Does this make sense?
Manu
On 02/10/2007, Emanuele D'Arrigo manu3d@gmail.com wrote:
Don't get me wrong: I'd be ok with MW being capable of digesting a TIFF, i.e. converting it to a PNG. But I do not see the use for inline TIFF display and storage, and given the possibility for a TIFF to be uploaded uncompressed, I'd strongly advocate -against- its use and storage "as is". Does this make sense?
Yes, it makes sense in context of the web, but I'm pointing out there's more to media storage than the web.
Same reason we have .XCF upload. It's the original version of the work.
- d.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Emanuele D'Arrigo wrote:
Ok, but when is a TIFF ever better than a PNG in a web-publishing context? When is a PNG "loosing" any information that is useful in such context?
Multiple pages? Metadata? Generally being the original source copy?
- -- brion vibber (brion @ wikimedia.org)
On 10/2/07, Emanuele D'Arrigo manu3d@gmail.com wrote:
Ok, but when is a TIFF ever better than a PNG in a web-publishing context? When is a PNG "loosing" any information that is useful in such context?
The point is, you're right that it's not losing any information useful for actual display. The information it's losing is only useful for subsequent editing and use in other formats. But we *want* to enable subsequent editing and use in other formats, as well as simple Web display.
On 10/2/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 10/2/07, Emanuele D'Arrigo manu3d@gmail.com wrote:
Ok, but when is a TIFF ever better than a PNG in a web-publishing context? When is a PNG "loosing" any information that is useful in such context?
The point is, you're right that it's not losing any information useful for actual display. The information it's losing is only useful for subsequent editing and use in other formats. But we *want* to enable subsequent editing and use in other formats, as well as simple Web display.
Useful for subsequent editing?
Only if the tiff is high-dynamic-range or layered. For HDR images there are better formats we could support (radience .hdr, or openexr), and for layered images I don't believe many applications will correctly handle layered tiffs, so tiff isn't a good format there either. :(
Still, I see the argument for tiff as a source format .. but right now we don't support a *lot* of other 'source formats'. If we come up with a good way to approach source files in general then we'd probably also resolve all the cases where there is a good argument to use tiff.
On 10/2/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Useful for subsequent editing?
Only if the tiff is high-dynamic-range or layered. For HDR images there are better formats we could support (radience .hdr, or openexr), and for layered images I don't believe many applications will correctly handle layered tiffs, so tiff isn't a good format there either. :(
I'm not recommending it as a format of choice for people who want to make their own images from scratch, or even saying that we should prioritize its support. I'm saying that it's not ideal to just upload everything as PNG or JPG just because that's the format we're going to use for display. In the case of an image that was originally TIFF, it might be fine in general to convert it to DjVu or XCF if you have a good converter, but it's not good in general to convert it to PNG/JPG/SVG as Emanuele was saying.
On 02/10/2007, Simetrical Simetrical+wikilist@gmail.com wrote:
On 10/2/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Useful for subsequent editing? Only if the tiff is high-dynamic-range or layered. For HDR images there are better formats we could support (radience .hdr, or openexr), and for layered images I don't believe many applications will correctly handle layered tiffs, so tiff isn't a good format there either. :(
I'm not recommending it as a format of choice for people who want to make their own images from scratch, or even saying that we should prioritize its support. I'm saying that it's not ideal to just upload everything as PNG or JPG just because that's the format we're going to use for display. In the case of an image that was originally TIFF, it might be fine in general to convert it to DjVu or XCF if you have a good converter, but it's not good in general to convert it to PNG/JPG/SVG as Emanuele was saying.
The actual bytes can be useful to have as well. I have uploaded original camera images specifically so that there's an example of what actually came out of a particular camera up on Commons.
- d.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
David Gerard wrote:
I just tried uploading a 12MB TIFF (a scan directly from the Library of Congress) to Commons. It waited until the whole 12MB had uploaded, of course, to tell me it didn't want it.
[snip]
- Is there any way for the software to say "no" earlier in the
process of making a huge upload?
That's a bit of a tough problem, though there's a few things could be done.
We could fairly easily add an advisory JavaScript-level check for forbidden file extensions which would prevent the form from being submitted. That would help against not-allowed formats, but not against mislabeled or just too-big files.
Per http://bugzilla.wikimedia.org/show_bug.cgi?id=10976 there may be possibilities for cutting off long uploads early, but it may require patching PHP, and I'm not sure how easy it'd be to provide good feedback.
HTTP file upload in HTML form submissions is simply pretty awkward to work with. :P
- -- brion vibber (brion @ wikimedia.org)
Brion Vibber wrote:
Per http://bugzilla.wikimedia.org/show_bug.cgi?id=10976 there may be possibilities for cutting off long uploads early, but it may require patching PHP, and I'm not sure how easy it'd be to provide good feedback.
HTTP file upload in HTML form submissions is simply pretty awkward to work with. :P
It doesn't seem so difficult. Well, until someone uploads a jpeg as application/x-www-form-urlencoded :P
The problem lies in accessing it before php starts executing. What about posting file uploads to a specific cgi for early rejections (bad extension, too big file, non-matching mime...) and pass then the uploaded file to php? That could be also useful for the direct uploading to the uploads servers which has sometimes been commented.
Hi,
Both APC & upload_progress extensions both can report progress of an upload. Using a separate page which refreshes every N seconds displaying the progress to the user.
Wondering if can send some javascript to a client to tell it to stop uploading, once a hard limit has been reached, maybe with something like a hidden iframe that is performing the actual post.
Problem with the extensions though, is that I believe they'd be unsuitable for the wp enviroment, APC puts the upload progress data into itself (APC), and upload_progress uses files, I believe. wp would perhaps need a memcache backend to it.
Jared
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Platonides Sent: 02 October 2007 21:52 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] More responsive upload rejection
Brion Vibber wrote:
there may be
possibilities for cutting off long uploads early, but it
may require
patching PHP, and I'm not sure how easy it'd be to provide
good feedback.
HTTP file upload in HTML form submissions is simply pretty
awkward to
work with. :P
It doesn't seem so difficult. Well, until someone uploads a jpeg as application/x-www-form-urlencoded :P
The problem lies in accessing it before php starts executing. What about posting file uploads to a specific cgi for early rejections (bad extension, too big file, non-matching mime...) and pass then the uploaded file to php? That could be also useful for the direct uploading to the uploads servers which has sometimes been commented.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/2/07, Brion Vibber brion@wikimedia.org wrote:
HTTP file upload in HTML form submissions is simply pretty awkward to work with. :P
*cough* upload from URL *cough*
Of course, it won't help against malicious or stupid people, but it would be an option for "the willing" (not the coalition of...), and it would make copyvio detection easier when used.
Magnus
On 10/2/07, Magnus Manske magnusmanske@googlemail.com wrote:
On 10/2/07, Brion Vibber brion@wikimedia.org wrote:
HTTP file upload in HTML form submissions is simply pretty awkward to work with. :P
*cough* upload from URL *cough*
Of course, it won't help against malicious or stupid people, but it would be an option for "the willing" (not the coalition of...), and it would make copyvio detection easier when used.
It's also an easy way to make the upload process client asyncronous.. the HTTP upload process could have a workqueue and perform the transfers directly from the storage backend system.
On 10/2/07, Magnus Manske magnusmanske@googlemail.com wrote:
On 10/2/07, Brion Vibber brion@wikimedia.org wrote:
HTTP file upload in HTML form submissions is simply pretty awkward to work with. :P
*cough* upload from URL *cough*
aka $wgAllowCopyUploads, for those who didn't know it's already implemented.
Magnus Manske wrote:
On 10/2/07, Brion Vibber brion@wikimedia.org wrote:
HTTP file upload in HTML form submissions is simply pretty awkward to work with. :P
*cough* upload from URL *cough*
Wow, amazing how that won't help the situation *at all* when uploading from one's own computer. ;)
-- brion vibber (brion @ wikimedia.org)
wikitech-l@lists.wikimedia.org