So it's time to have this discussion again. At least, I think we're having it again, though I could not find previous threads on this list about the subject.
In short, scaled media is currently generated on the fly for any size and for any user. The resulting files are kept around forever or until we run perilously short of space, at which point we make some guesses about what we can toss and then do a mass purge. Last time we did so, we had the rotation bug going at the same time, which made for a real fine mess.
A little bit of crunching shows me that we have about 6 million images in use on the projects, and yet we manage to have around 130 million thumbnails. Just for fun I checked to see how many thumbs each image has, what sizes we are looking at, etc. Here's the results.
Some "standard" sizes are most popular, with between 200K and 640K media files having thumbs scaled to each of these widths: 75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels
But there's plenty of "odd" sizes with lots of thumbs too. For example, over 65K files with width 181px, 20K with width 138px.
As an experiment and before having this data, I purged from ms5 (no longer in use for thumbs) 1/16 of the thumbs that were greater than 100px wide but not one of these widths: 120px, 200px, 220px, 250px, 320px, 640px, 800px We got back over 300GB of space.
The other thing about delivering any scaled version on demand is that we have some media files with several hundred different thumb sizes in there. Here's a few of the top offenders for your entertainment:
2514 wikipedia/commons/thumb/f/f9/Orange_and_cross_section.jpg 2285 wikipedia/commons/thumb/f/fb/Thrermal_grease.jpg 2218 wikipedia/commons/thumb/f/fc/Blue_sport.jpg 2071 wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg 2062 wikipedia/commons/thumb/f/f2/Flag_of_Costa_Rica.svg 2034 wikipedia/commons/thumb/f/f8/Wiktionary-logo-en.svg 1915 wikipedia/commons/thumb/f/f6/VeulesLesRoses.JPG 1689 wikipedia/commons/thumb/f/fa/Wikibooks-logo.svg 1447 wikipedia/commons/thumb/f/fa/Wikiquote-logo.svg 1371 wikipedia/commons/thumb/f/f0/Mori_Uncanny_Valley.svg 1249 wikipedia/commons/thumb/f/f5/Grand_prismatic_spring.jpg 1246 wikipedia/commons/thumb/f/f3/Mature.jpg 1191 wikipedia/commons/thumb/f/f7/Kirchdorf_in_Tirol.JPG 1187 wikipedia/commons/thumb/f/f8/Camille_Cabral_pour_les_Trans.JPG 1143 wikipedia/commons/thumb/f/f7/Profanity.svg 1079 wikipedia/commons/thumb/f/f2/HSV_color_solid_cone.png 1040 wikipedia/commons/thumb/f/f2/Carmen_Electra.jpg 1032 wikipedia/commons/thumb/f/f1/Pink_eye.jpg 1001 wikipedia/commons/thumb/f/f6/USNS_Medgar_Evers_announcement.jpg
I'd comment on some of those but I'd be too snarky.
So there are some things we could change:
1. We could generate and keep only certain sizes, tossing the rest. 2. We could keep *nothing*, scaling all media as required. 3. We could have a cron job that was clever about tossing thumbs every day (not sure how easy it would be to be clever). 4. ??
In any of these cases, the squids will have copies of recently requested scaled media, so we won't be scaling the same file to the same size over and over in a short time frame.
What do folks think about how to proceed?
Ariel
On 31 August 2012 13:36, Ariel T. Glenn ariel@wikimedia.org wrote:
- We could generate and keep only certain sizes, tossing the rest.
- We could keep *nothing*, scaling all media as required.
- We could have a cron job that was clever about tossing thumbs every
day (not sure how easy it would be to be clever). 4. ?? In any of these cases, the squids will have copies of recently requested scaled media, so we won't be scaling the same file to the same size over and over in a short time frame.
To be obvious for #3:
* Do we know access times for these files? Can stuff be purged that hasn't been accessed in x time? What values of x would be good? * More generally: what's the tradeofff between generating a thumbnail afresh and keeping an old copy around until it's needed? Just how CPU-stressed is the thumbnailer?
- d.
On Fri, Aug 31, 2012 at 7:36 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
So there are some things we could change:
- We could generate and keep only certain sizes, tossing the rest.
Heck yes. Generate some standard sizes at upload time and let the browser scale if a funny size is demanded. Modern browsers scale photos nicely, not like the nearest-neighbor ugliness from 2002.
This'll simplify our thumbnail-serving architecture, remove some DoS vectors, and if we pick the next size up makes things look better when zooming or on high-density screens.
Downside: diagrams and charts done as PNGs or JPGs might not look as sharp at non-standard sizes.
We should also start considering serving SVGs directly to supporting browsers, so they always look nice at any size -- and at any zoom level. (Downside of this: this means we have to start thinking about size and rendering efficiency in SVGs -- don't use a 6 megabyte super-detailed map for something that's going to be shown at 200px most of the time!)
-- brion
- We could generate and keep only certain sizes, tossing the rest.
Heck yes. Generate some standard sizes at upload time and let the browser scale if a funny size is demanded. Modern browsers scale photos nicely,
not
like the nearest-neighbor ugliness from 2002.
+1 I was very surprised to learn any thumbnail sizes could be generated. We should standardise on a tiny, small,medium high and original resolutions. 5 sizes seems more than enough.
This'll simplify our thumbnail-serving architecture, remove some D _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 31 August 2012 18:21, Jon Robson jdlrobson@gmail.com wrote:
+1 I was very surprised to learn any thumbnail sizes could be generated. We should standardise on a tiny, small,medium high and original resolutions. 5 sizes seems more than enough.
I would suggest checking against the corpus before saying "x should be enough for anyone". There are also occasional but valid uses for non-standard sizes.
- d.
I would suggest checking against the corpus before saying "x should be enough for anyone". There are also occasional but valid uses for non-standard sizes.
Perhaps generate the standard thumbnail sizes at upload time and then generate and cache tumbnails of non-standard times until they haven't been accessed for a few days.
Doesn't take up much space, like just deleting all the thumbnails that are generated, and it saves time down the road as the most common thumbnails are already generated.
I wouldn't rely too much on the Squids. They do an excellent job, but thinking of those of us who use Mediawiki outside the foundation, I would much rather see an intelligent thumbnail generating and caching scheme that doesn't rely on Squid being present.
Thank you, Derric Atzrott
On Fri, Aug 31, 2012 at 1:48 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Perhaps generate the standard thumbnail sizes at upload time
I believe the status quo is no thumbs (of any size) are generated at upload time. They are all just done on demand.
-Jeremy
On Fri, Aug 31, 2012 at 1:22 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Fri, Aug 31, 2012 at 1:48 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Perhaps generate the standard thumbnail sizes at upload time
I believe the status quo is no thumbs (of any size) are generated at upload time. They are all just done on demand.
That's correct in theory. In practice (for uploads using Special:Upload at least) the user is redirected to the file description page after a successful upload, which contains a thumb of the file of a given size (800px?), which is then immediately generated on demand :) . Special:UploadWizard has similar behavior: the success page contains 200px(?) thumbs of all the images the user uploaded.
Roan
Ariel wrote:
But there's plenty of "odd" sizes with lots of thumbs too. For example, over 65K files with width 181px
181px images are tipically made when 180px images are "bad", with the squid/scaler refusing to deliver the latest version. Adding 1px there forces a regeneration of the file, solving the issue. Going up to 65K seems too much, though.
On 31/08/12 22:35, Roan Kattouw wrote:
On Fri, Aug 31, 2012 at 1:22 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
I believe the status quo is no thumbs (of any size) are generated at upload time. They are all just done on demand.
That's correct in theory. In practice (for uploads using Special:Upload at least) the user is redirected to the file description page after a successful upload, which contains a thumb of the file of a given size (800px?), which is then immediately generated on demand :) . Special:UploadWizard has similar behavior: the success page contains 200px(?) thumbs of all the images the user uploaded.
Roan
Plus another tiny one at the file history section.
I remember seeing a change about rounding the requested image size some months ago, so the "request any size" may not be true any longer.
On 01/09/12 03:48, Derric Atzrott wrote:
I wouldn't rely too much on the Squids. They do an excellent job, but thinking of those of us who use Mediawiki outside the foundation, I would much rather see an intelligent thumbnail generating and caching scheme that doesn't rely on Squid being present.
We've been relying on Squid being present for years. Squid does a good job of LRU caching, it's fast and the size is ample. It's persistent, storing thumbnails on disk (some servers have flash). It would take a lot of work to do something as good as it in the backend.
If we wanted a greater cold cache capacity, then we could add more image scalers.
Video transcoding is an exception, but it needs to be handled separately anyway, because you can't run an hour-long transcoding job with no concurrency control, no error handling and no user feedback (progress bars etc.).
-- Tim Starling
On Fri, Aug 31, 2012 at 10:26 AM, David Gerard dgerard@gmail.com wrote:
On 31 August 2012 18:21, Jon Robson jdlrobson@gmail.com wrote:
+1 I was very surprised to learn any thumbnail sizes could be generated. We should standardise on a tiny, small,medium high and original resolutions. 5 sizes seems more than enough.
I would suggest checking against the corpus before saying "x should be enough for anyone". There are also occasional but valid uses for non-standard sizes.
- d.
True, but if we were to institute a cleanup program to rationalize sizes to a limited set, even if particular users object on a page here or there and we leave it be, the vast bulk of the work is easy and gets us the size usage win.
The questions to me are what sizes to standardize towards, and how can we tell where things are used at what size?
On Fri, Aug 31, 2012 at 10:21 AM, Jon Robson jdlrobson@gmail.com wrote:
- We could generate and keep only certain sizes, tossing the rest.
Heck yes. Generate some standard sizes at upload time and let the browser scale if a funny size is demanded. Modern browsers scale photos nicely,
not
like the nearest-neighbor ugliness from 2002.
+1 I was very surprised to learn any thumbnail sizes could be generated. We should standardise on a tiny, small,medium high and original resolutions. 5 sizes seems more than enough.
Don't forget that Mediawiki is also more than just Wikipedia. I've worked with Mediawiki sites that were more graphics-focused and more detailed on there layout demands than Wikipedia, and seen far more than five sizes in play. The diversity of sizes can be surprising, but isn't necessarily illogical, for example, one might specify sizes so that 2 or 3 or 4 images neatly span a 670px main column on a site formatted for 800px displays.
Such things happen in the larger universe of Mediawiki sites, even though they might be frowned upon on Wikipedia, etc. Personally, I've found Mediawiki's flexibility to generate any size on demand to be very useful. And I would agree with Isarra that even though browsers are much better than they were at scaling, it still results in a loss of detail if both the server and browser are asked to do scaling.
So, my strong preference would be for solutions that don't assume a few sizes are good enough for everyone. At the very least, Mediawiki should be configurable to preserve the current behavior as an option (i.e. allowing any requested size to be produced by the server).
I would also note that the standard image description page currently provides direct links to images scaled to "Other resolutions". This means that typical large image will come in at least 5 sizes (320px, 640px, 800px, 1024px, 1280px) plus the typical "thumb" size (220px on most Wikipedias, I believe), as well as the full-size resolution.
That said, I agree that finding a way to expire old thumbs, or rarely accessed thumbs, is definitely a good idea.
-Robert Rohde
On 31/08/2012 08:57, Brion Vibber wrote:
Heck yes. Generate some standard sizes at upload time and let the browser scale if a funny size is demanded. Modern browsers scale photos nicely, not like the nearest-neighbor ugliness from 2002.
As a graphist, I must say this does not seem like a good idea. Only rendering certain sizes and having the browser then scale the weird ones will still result in fuzzy images, because no matter how good the renderer, every time a bitmap image is scaled down, sharpness is lost. This is part of why there is so much emphasis placed on using vectors even in a static environment - with those, the first scale down is also avoided, and there is a very visible difference in clarity even there. But while only rendering certain sizes and then having the browser scale those would defeat that purpose, having to scale down bitmaps twice would look even worse, regardless of subject.
On 31 August 2012 19:45, Isarra Yos zhorishna@gmail.com wrote:
On 31/08/2012 08:57, Brion Vibber wrote:
Heck yes. Generate some standard sizes at upload time and let the browser scale if a funny size is demanded. Modern browsers scale photos nicely, not like the nearest-neighbor ugliness from 2002.
As a graphist, I must say this does not seem like a good idea. Only rendering certain sizes and having the browser then scale the weird ones will still result in fuzzy images, because no matter how good the renderer, every time a bitmap image is scaled down, sharpness is lost. This is part of why there is so much emphasis placed on using vectors even in a static environment - with those, the first scale down is also avoided, and there is a very visible difference in clarity even there. But while only rendering certain sizes and then having the browser scale those would defeat that purpose, having to scale down bitmaps twice would look even worse, regardless of subject.
-- -â Isarra
A possible scenario where a human intervention is always needed is generating icons from svg files that draw flags or ..icons.
A 16x16 pixels USA flag rendered from a SVG by some naive rescaling (mipmaping?) will look worse than wrong. Perhaps you still want to have this icon generated from or inspired by the SVG file. In videogames this sort of problems are sometimes solved by having all the scaled versions precalculated in a single file: mipmaps.
http://en.wikipedia.org/wiki/Mipmap
Modern videogames uses other more advanced techniques, but the beauty of mipmaps is that can be artist edited (perhaps the artist can edit the 16x16 pixels version to still make sense.
If we make a cron job, could we also have it purge all SVG thumbnails older than say 5 years?
Ryan Kaldari
On Aug 31, 2012, at 5:36 AM, "Ariel T. Glenn" ariel@wikimedia.org wrote:
So it's time to have this discussion again. At least, I think we're having it again, though I could not find previous threads on this list about the subject.
In short, scaled media is currently generated on the fly for any size and for any user. The resulting files are kept around forever or until we run perilously short of space, at which point we make some guesses about what we can toss and then do a mass purge. Last time we did so, we had the rotation bug going at the same time, which made for a real fine mess.
A little bit of crunching shows me that we have about 6 million images in use on the projects, and yet we manage to have around 130 million thumbnails. Just for fun I checked to see how many thumbs each image has, what sizes we are looking at, etc. Here's the results.
Some "standard" sizes are most popular, with between 200K and 640K media files having thumbs scaled to each of these widths: 75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels
But there's plenty of "odd" sizes with lots of thumbs too. For example, over 65K files with width 181px, 20K with width 138px.
As an experiment and before having this data, I purged from ms5 (no longer in use for thumbs) 1/16 of the thumbs that were greater than 100px wide but not one of these widths: 120px, 200px, 220px, 250px, 320px, 640px, 800px We got back over 300GB of space.
The other thing about delivering any scaled version on demand is that we have some media files with several hundred different thumb sizes in there. Here's a few of the top offenders for your entertainment:
2514 wikipedia/commons/thumb/f/f9/Orange_and_cross_section.jpg 2285 wikipedia/commons/thumb/f/fb/Thrermal_grease.jpg 2218 wikipedia/commons/thumb/f/fc/Blue_sport.jpg 2071 wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg 2062 wikipedia/commons/thumb/f/f2/Flag_of_Costa_Rica.svg 2034 wikipedia/commons/thumb/f/f8/Wiktionary-logo-en.svg 1915 wikipedia/commons/thumb/f/f6/VeulesLesRoses.JPG 1689 wikipedia/commons/thumb/f/fa/Wikibooks-logo.svg 1447 wikipedia/commons/thumb/f/fa/Wikiquote-logo.svg 1371 wikipedia/commons/thumb/f/f0/Mori_Uncanny_Valley.svg 1249 wikipedia/commons/thumb/f/f5/Grand_prismatic_spring.jpg 1246 wikipedia/commons/thumb/f/f3/Mature.jpg 1191 wikipedia/commons/thumb/f/f7/Kirchdorf_in_Tirol.JPG 1187 wikipedia/commons/thumb/f/f8/Camille_Cabral_pour_les_Trans.JPG 1143 wikipedia/commons/thumb/f/f7/Profanity.svg 1079 wikipedia/commons/thumb/f/f2/HSV_color_solid_cone.png 1040 wikipedia/commons/thumb/f/f2/Carmen_Electra.jpg 1032 wikipedia/commons/thumb/f/f1/Pink_eye.jpg 1001 wikipedia/commons/thumb/f/f6/USNS_Medgar_Evers_announcement.jpg
I'd comment on some of those but I'd be too snarky.
So there are some things we could change:
- We could generate and keep only certain sizes, tossing the rest.
- We could keep *nothing*, scaling all media as required.
- We could have a cron job that was clever about tossing thumbs every
day (not sure how easy it would be to be clever). 4. ??
In any of these cases, the squids will have copies of recently requested scaled media, so we won't be scaling the same file to the same size over and over in a short time frame.
What do folks think about how to proceed?
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
How hard/easy is it to determine when a thumb file has last been accessed (by the squids or any means for that matter)?
If easy, why not have some process delete the thumbs that have not been accessed for (squid expiration time + 1 day)? That ensures thumbs live for as long as needed (or until purged), without adding to the scaler's workload. Basically let the thumb files expire as they do on the squids.
I imagine the first run would be a mammoth job, but subsequent runs shouldn't be stressing at all.
On Fri, 31 Aug 2012 05:36:18 -0700, Ariel T. Glenn ariel@wikimedia.org wrote:
So it's time to have this discussion again. At least, I think we're having it again, though I could not find previous threads on this list about the subject.
In short, scaled media is currently generated on the fly for any size and for any user. The resulting files are kept around forever or until we run perilously short of space, at which point we make some guesses about what we can toss and then do a mass purge. Last time we did so, we had the rotation bug going at the same time, which made for a real fine mess.
A little bit of crunching shows me that we have about 6 million images in use on the projects, and yet we manage to have around 130 million thumbnails. Just for fun I checked to see how many thumbs each image has, what sizes we are looking at, etc. Here's the results.
Some "standard" sizes are most popular, with between 200K and 640K media files having thumbs scaled to each of these widths: 75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels
But there's plenty of "odd" sizes with lots of thumbs too. For example, over 65K files with width 181px, 20K with width 138px.
As an experiment and before having this data, I purged from ms5 (no longer in use for thumbs) 1/16 of the thumbs that were greater than 100px wide but not one of these widths: 120px, 200px, 220px, 250px, 320px, 640px, 800px We got back over 300GB of space.
The other thing about delivering any scaled version on demand is that we have some media files with several hundred different thumb sizes in there. Here's a few of the top offenders for your entertainment:
2514 wikipedia/commons/thumb/f/f9/Orange_and_cross_section.jpg 2285 wikipedia/commons/thumb/f/fb/Thrermal_grease.jpg 2218 wikipedia/commons/thumb/f/fc/Blue_sport.jpg 2071 wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg 2062 wikipedia/commons/thumb/f/f2/Flag_of_Costa_Rica.svg 2034 wikipedia/commons/thumb/f/f8/Wiktionary-logo-en.svg 1915 wikipedia/commons/thumb/f/f6/VeulesLesRoses.JPG 1689 wikipedia/commons/thumb/f/fa/Wikibooks-logo.svg 1447 wikipedia/commons/thumb/f/fa/Wikiquote-logo.svg 1371 wikipedia/commons/thumb/f/f0/Mori_Uncanny_Valley.svg 1249 wikipedia/commons/thumb/f/f5/Grand_prismatic_spring.jpg 1246 wikipedia/commons/thumb/f/f3/Mature.jpg 1191 wikipedia/commons/thumb/f/f7/Kirchdorf_in_Tirol.JPG 1187 wikipedia/commons/thumb/f/f8/Camille_Cabral_pour_les_Trans.JPG 1143 wikipedia/commons/thumb/f/f7/Profanity.svg 1079 wikipedia/commons/thumb/f/f2/HSV_color_solid_cone.png 1040 wikipedia/commons/thumb/f/f2/Carmen_Electra.jpg 1032 wikipedia/commons/thumb/f/f1/Pink_eye.jpg 1001 wikipedia/commons/thumb/f/f6/USNS_Medgar_Evers_announcement.jpg
I'd comment on some of those but I'd be too snarky.
So there are some things we could change:
- We could generate and keep only certain sizes, tossing the rest.
- We could keep *nothing*, scaling all media as required.
- We could have a cron job that was clever about tossing thumbs every
day (not sure how easy it would be to be clever). 4. ??
In any of these cases, the squids will have copies of recently requested scaled media, so we won't be scaling the same file to the same size over and over in a short time frame.
What do folks think about how to proceed?
Ariel
Another idea I've played with was development of a LRU filesystem. Probably a FUSE module. You would mount it at thumbs/ and unused files would periodically disappear.
Another idea I've played with was development of a LRU filesystem. Probably a FUSE module. You would mount it at thumbs/ and unused files would periodically disappear.
We don't mount these filesystems anymore. We use an object store call swift. FUSE also doesn't exist outside of Linux, right? So, this likely wouldn't be terribly useful as a core feature.
- Ryan
Maybe we could have "large","medium","small" etc as aliases for standard/popular sizes to encourage using less of the non-standard ones?
On Fri, Aug 31, 2012 at 3:52 PM, Daniel Zahn dzahn@wikimedia.org wrote:
Maybe we could have "large","medium","small" etc as aliases for standard/popular sizes to encourage using less of the non-standard ones?
I kinda like this. It would also be nice if simply including an image defaulted to some sane size, even without using explicit "|thumb".
In fact, we should think about just redoing how images get included in the first place maybe. :P
I'd kind of like to see something like this:
{{#media:Foobar.jpg}} <- default to a nice size, displayed in some nice-looking way suitable to the output. Sane framing and positioning typical for most usages.
{{#media:Foobar.jpg|caption=Hello this is my caption about [[stuff]]. Enjoy!}} <- caption should probably be an explicitly named parameter
* Consider having *no size option* at all. :) * Definitely don't have "left" "right" or "center" options. * Consider making it easy to collect multiple related photos together, like <gallery>.
Or maybe we should just use <gallery> more aggressively and make it a billion times prettier...
For the more icon-like uses, maybe an explicit inline-media function:
{{#inline-media:Foobar.svg|24x24px}}
Anyway.... this needs more thought. But for a lot of images, we don't really *need* to be manually specifying every detail of their layout. It feels like it would be nicer to say "stuff these photos, with these captions, into this section of the article" and let the wiki deal with laying them out.
Note that in mobile/tablet contexts it's also very handy to be able to extract just the photos and provide them for separate browsing; this has influenced my thinking on this for sure.
-- brion
On Fri, Aug 31, 2012 at 5:52 PM, Brion Vibber brion@pobox.com wrote:
Note that in mobile/tablet contexts it's also very handy to be able to extract just the photos and provide them for separate browsing; this has influenced my thinking on this for sure.
^ in particular, distinguishing between "editorial" photo/diagram content and icon-like uses of images would be a huge help. We don't want to stick template icons in an image gallery of photos on an article.
-- brion
Please don't. The current syntax is nice, concise, consistent and not overflowing with special characters. The proposed one is verbose and "looks technical". But defaulting to thumb seems like a good idea to me (but we ought to make some usage stats first :) ).
2012/9/1, Brion Vibber brion@pobox.com:
On Fri, Aug 31, 2012 at 3:52 PM, Daniel Zahn dzahn@wikimedia.org wrote:
Maybe we could have "large","medium","small" etc as aliases for standard/popular sizes to encourage using less of the non-standard ones?
I kinda like this. It would also be nice if simply including an image defaulted to some sane size, even without using explicit "|thumb".
In fact, we should think about just redoing how images get included in the first place maybe. :P
I'd kind of like to see something like this:
{{#media:Foobar.jpg}} <- default to a nice size, displayed in some nice-looking way suitable to the output. Sane framing and positioning typical for most usages.
{{#media:Foobar.jpg|caption=Hello this is my caption about [[stuff]]. Enjoy!}} <- caption should probably be an explicitly named parameter
- Consider having *no size option* at all. :)
- Definitely don't have "left" "right" or "center" options.
- Consider making it easy to collect multiple related photos together, like
<gallery>.
Or maybe we should just use <gallery> more aggressively and make it a billion times prettier...
For the more icon-like uses, maybe an explicit inline-media function:
{{#inline-media:Foobar.svg|24x24px}}
Anyway.... this needs more thought. But for a lot of images, we don't really *need* to be manually specifying every detail of their layout. It feels like it would be nicer to say "stuff these photos, with these captions, into this section of the article" and let the wiki deal with laying them out.
Note that in mobile/tablet contexts it's also very handy to be able to extract just the photos and provide them for separate browsing; this has influenced my thinking on this for sure.
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Aug 31, 2012 at 6:17 PM, Bartosz DziewoĆski matma.rex@gmail.comwrote:
Please don't. The current syntax is nice, concise, consistent and not overflowing with special characters. The proposed one is verbose and "looks technical".
The current syntax is actually hard to machine-parse, with lots of language-specific overrides and weird options that combine in non-obvious ways. Not to mention that it overrides the simple link syntax... I wish when we'd renamed Image: to File: that we'd left the magic behavior only on the Image: alias so that links would be rationalized. :(
Keep in mind also that in the glorious unicorn-filled future most people who aren't doing low-level template work will rarely see the markup. They'll just push a button to insert an image.
But defaulting to thumb seems like a good idea to me (but we ought to make some usage stats first :) ).
Yeah... changing existing behavior gets scary. :)
-- brion
On Fri, Aug 31, 2012 at 4:38 PM, Brion Vibber brion@pobox.com wrote:
Keep in mind also that in the glorious unicorn-filled future most people who aren't doing low-level template work will rarely see the markup. They'll just push a button to insert an image.
Is that really a rainbow coming out of the Unicorn, Brion? It seems like an unlikely place to...... Nevermind.
But defaulting to thumb seems like a good idea to me (but we ought to make some usage stats first :) ).
Yeah... changing existing behavior gets scary. :)
Usage stats win. So - how can we get those?
George Herbert wrote:
But defaulting to thumb seems like a good idea to me (but we ought to make some usage stats first :) ).
Yeah... changing existing behavior gets scary. :)
Usage stats win. So - how can we get those?
How much consideration is (or should be) given to people who hotlink images from Wikimedia Commons or another Wikimedia wiki?
MZMcBride
On 1 September 2012 00:38, Brion Vibber brion@pobox.com wrote:
The current syntax is actually hard to machine-parse, with lots of language-specific overrides and weird options that combine in non-obvious ways. Not to mention that it overrides the simple link syntax... I wish when we'd renamed Image: to File: that we'd left the magic behavior only on the Image: alias so that links would be rationalized. :(
PROBLEM: Thumbnails are taking up a lot of disk space. SOLUTION: Completely revamp image syntax in a backwards-incompatible manner.
There's something excessive there ...
Is it really the case that the server can't get atime of the images? iIf it can, then we *know* which thumbs may as well be trashed and regenerated as and when someone cares. If it can't, why not?
- d.
On Fri, Aug 31, 2012 at 9:31 PM, David Gerard dgerard@gmail.com wrote:
On 1 September 2012 00:38, Brion Vibber brion@pobox.com wrote:
The current syntax is actually hard to machine-parse, with lots of language-specific overrides and weird options that combine in non-obvious ways. Not to mention that it overrides the simple link syntax... I wish when we'd renamed Image: to File: that we'd left the magic behavior only
on
the Image: alias so that links would be rationalized. :(
PROBLEM: Thumbnails are taking up a lot of disk space. SOLUTION: Completely revamp image syntax in a backwards-incompatible manner.
There's something excessive there ...
Nah, we're just years behind on modernizing image handling. Such improvements need to be done at some point regardless of this particular question, but it would help with it in certain ways.
Is it really the case that the server can't get atime of the images?
iIf it can, then we *know* which thumbs may as well be trashed and regenerated as and when someone cares. If it can't, why not?
I'd tend to think that image scaler CPU time is more precious than disk space used by thumbs; in theory we don't actually need to "store" thumbs if we just cache them and have a suitably large cache. A caching HTTP proxy should have some sort of LRU-or-other system to discard old things that aren't being used; my main worry would be about whether the system can survive the load of a cache being cleared (say due to downtime, upgrades, incompatible storage formats for upgrades, or whatever).
Of course if you don't have to scale anything on demand on the server side, suddenly the entire problem disappears. Just something to think about.
-- brion
On Aug 31, 2012 11:52 PM, "Brion Vibber" brion@pobox.com wrote:
- Definitely don't have "left" "right" or "center" options.
Can you elaborate on that? The positioning of images can make a big difference to how a page looks. Do you really think you can automate it in a way that makes pages always look good? It's also useful to be able to know where an image is going to be displayed so you can say thing like "as can be seen in the image to the right".
Getting images to work well on phones and tablets probably requires more user control, not less. It would be useful to be able to specify whether an image is vital to the article and should always be displayed or if it is just there to look nice and can be skipped if there isn't much screen space. (Sensible defaults are a must, of course.)
On Sun, Sep 2, 2012 at 8:25 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
On Aug 31, 2012 11:52 PM, "Brion Vibber" brion@pobox.com wrote:
- Definitely don't have "left" "right" or "center" options.
Can you elaborate on that? The positioning of images can make a big difference to how a page looks. Do you really think you can automate it in a way that makes pages always look good?
Looking at say https://en.wikipedia.org/wiki/San_Francisco the positioning of most photos to left or right floats seems fairly random; where both left and right are used it seems to be a manual hack to keep images from stacking on top of other images or tables, based on typical screen sizes.
Would an automatic gallery layout look as good and be as usable? Honestly, it might; I don't see much that's meaningful about the way these images are laid out that would be lost by a different layout.
Would it look the same exactly? No, but who cares?
Should it lay out in right and left alignment? Maybe not -- maybe it should use horizontal space and avoid floats? Maybe it should use a dedicated right-side gutter (left on RTL)?
Maybe we should at least think about it.
It's also useful to be able to
know where an image is going to be displayed so you can say thing like "as can be seen in the image to the right".
No space to left or right on mobile; safer not to rely on such positioning being consistently relatable.
Consider also a hyperlink instead of a vague direction when referencing something. :)
Getting images to work well on phones and tablets probably requires more
user control, not less. It would be useful to be able to specify whether an image is vital to the article and should always be displayed or if it is just there to look nice and can be skipped if there isn't much screen space. (Sensible defaults are a must, of course.)
Indeed, distinguishing between different types of things can help -- and I think would help far more than any manual positioning in the majority of cases that aren't icons or otherwise explicitly inline in text or a table.
Note that tables, infoboxes, etc have the same issues with positioning, floating, referencing, and whatnot. And like panoramic images, they sometimes don't fit on small screens well; that's another thing to think about.
-- brion
On Mon, Sep 3, 2012 at 5:45 AM, Brion Vibber brion@pobox.com wrote:
On Sun, Sep 2, 2012 at 8:25 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
On Aug 31, 2012 11:52 PM, "Brion Vibber" brion@pobox.com wrote:
- Definitely don't have "left" "right" or "center" options.
Can you elaborate on that? The positioning of images can make a big difference to how a page looks. Do you really think you can automate it in a way that makes pages always look good?
Looking at say https://en.wikipedia.org/wiki/San_Francisco the positioning of most photos to left or right floats seems fairly random; where both left and right are used it seems to be a manual hack to keep images from stacking on top of other images or tables, based on typical screen sizes.
Would an automatic gallery layout look as good and be as usable? Honestly, it might; I don't see much that's meaningful about the way these images are laid out that would be lost by a different layout.
Would it look the same exactly? No, but who cares?
Should it lay out in right and left alignment? Maybe not -- maybe it should use horizontal space and avoid floats? Maybe it should use a dedicated right-side gutter (left on RTL)?
Maybe we should at least think about it.
It's also useful to be able to
know where an image is going to be displayed so you can say thing like "as can be seen in the image to the right".
No space to left or right on mobile; safer not to rely on such positioning being consistently relatable.
Consider also a hyperlink instead of a vague direction when referencing something. :)
Getting images to work well on phones and tablets probably requires more
user control, not less. It would be useful to be able to specify whether an image is vital to the article and should always be displayed or if it is just there to look nice and can be skipped if there isn't much screen space. (Sensible defaults are a must, of course.)
Indeed, distinguishing between different types of things can help -- and I think would help far more than any manual positioning in the majority of cases that aren't icons or otherwise explicitly inline in text or a table.
Note that tables, infoboxes, etc have the same issues with positioning, floating, referencing, and whatnot. And like panoramic images, they sometimes don't fit on small screens well; that's another thing to think about.
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Illustrating the problem of manual right/left aligned thumbnails and elements by using slightly different CSS:
http://toolserver.org/~magnus/redefined/?page=San%20Francisco
Magnus
On 03/09/2012 05:33, Magnus Manske wrote:
Illustrating the problem of manual right/left aligned thumbnails and elements by using slightly different CSS:
http://toolserver.org/~magnus/redefined/?page=San%20Francisco
Magnus
That looks like as much a problem with the general design there as with the use of images on the page itself, though.
On 3 September 2012 17:10, Isarra Yos zhorishna@gmail.com wrote:
On 03/09/2012 05:33, Magnus Manske wrote:
Illustrating the problem of manual right/left aligned thumbnails and elements by using slightly different CSS:
http://toolserver.org/~magnus/redefined/?page=San%20Francisco
Magnus
That looks like as much a problem with the general design there as with the use of images on the page itself, though.
I agree. The skin not wrapping text around the images isn't caused by users having a choice of what side of the page to put the image on. It's caused by the skin being badly designed...
Also wondering if there are any thumbnails that are larger than their actual images, and if yes to get rid of them.
On Fri, Aug 31, 2012 at 4:25 PM, Daniel Zahn dzahn@wikimedia.org wrote:
Also wondering if there are any thumbnails that are larger than their actual images, and if yes to get rid of them.
For raster image formats, we don't generate thumbs larger than the original -- we just use the original image and let the browser stretch it to the requested size.
For vector image formats (SVG, possibly PDF) we can and do generate thumbnails larger than the canonical width and height, so that tiny SVG files can be scaled up. But a hard limit is enforced, I think 2048px or something.
-- brion
On 31/08/12 22:36, Ariel T. Glenn wrote:
So there are some things we could change:
- We could generate and keep only certain sizes, tossing the rest.
- We could keep *nothing*, scaling all media as required.
- We could have a cron job that was clever about tossing thumbs every
day (not sure how easy it would be to be clever). 4. ??
I'll go for option 4. You can't delete the images from the backend while they are still in Squid, because then they would not be purged when the image is updated or action=purge is requested. In fact, that is one of only two reasons for the existence of the backend thumbnail store on Wikimedia. The thumbnail backend could be replaced by a text file that stores a list of thumbnail filenames which were sent to Squid within a window equivalent to the expiry time sent in the Cache-Control header.
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
-- Tim Starling
On 03/09/12 02:59, Tim Starling wrote:
I'll go for option 4. You can't delete the images from the backend while they are still in Squid, because then they would not be purged when the image is updated or action=purge is requested. In fact, that is one of only two reasons for the existence of the backend thumbnail store on Wikimedia. The thumbnail backend could be replaced by a text file that stores a list of thumbnail filenames which were sent to Squid within a window equivalent to the expiry time sent in the Cache-Control header.
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
-- Tim Starling
The second one seems easy to fix. The first one should IMHO be fixed in squid/varnish by allowing wildcard purges (ie. PURGE /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)
A wiki with such setup could then disable the on-disk storage.
On Tue, Sep 4, 2012 at 3:11 PM, Platonides Platonides@gmail.com wrote:
On 03/09/12 02:59, Tim Starling wrote:
I'll go for option 4. You can't delete the images from the backend while they are still in Squid, because then they would not be purged when the image is updated or action=purge is requested. In fact, that is one of only two reasons for the existence of the backend thumbnail store on Wikimedia. The thumbnail backend could be replaced by a text file that stores a list of thumbnail filenames which were sent to Squid within a window equivalent to the expiry time sent in the Cache-Control header.
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
-- Tim Starling
The second one seems easy to fix. The first one should IMHO be fixed in squid/varnish by allowing wildcard purges (ie. PURGE /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)
fast.ly implements group purge for varnish like this via a proxy daemon that watches backend responses for a "tag" response header (i.e. all resolutions of Tim_starling.jpg would be tagged that) and builds an in-memory hash of tags->objects which can be purged on. I've been told they'd probably open source the code for us if we want it, and it is interesting (especially to deal with the fact that we don't purge articles at all of their possible url's) albeit with its own challenges. If we implemented a backend system to track thumbnails that exist for a given orig, we may be able to remove our dependency on swift container listings to purge images, paving the way for a second class of thumbnails that are only cached.
A wiki with such setup could then disable the on-disk storage.
I think this is entirely doable, but scaling the imagescalers to support cache failures at wmf scale would be a waste, except perhaps for non-standard sizes that aren't widely used. I like Brion's thoughts on revamping image handling, and would like to see semi-permanent (in swift) storage of a standardized set of thumbnail resolutions but we could still support additional resolutions. Browser scaling is also at least worth experimenting with. Instances where browser scaling would be bad are likely instances where the image is already subpar if viewed on a high-dpi / retina display.
On Wed, Sep 5, 2012 at 12:35 PM, Asher Feldman afeldman@wikimedia.org wrote:
Browser scaling is also at least worth experimenting with. Instances where browser scaling would be bad are likely instances where the image is already subpar if viewed on a high-dpi / retina display.
Other instances where browser scaling is bad are: * PoS browsers that don't render SVGs (how old are these by now?) * Even modern browsers have subpar SVG rendering at 1x, PNG looks better * Some media types are "scaled" in unusual ways (SVGs, but also video stills, PDF pages, ...) * Some original images are really friggin' large (20-30 megapixels sometimes), so at least some downscaling is needed there * Mobile clients will want to minimize the amount of data transferred
Roan
On Wed, Sep 5, 2012 at 2:00 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
On Wed, Sep 5, 2012 at 12:35 PM, Asher Feldman afeldman@wikimedia.org wrote:
Browser scaling is also at least worth experimenting with. Instances where browser scaling would be bad are likely instances where the image is already subpar if viewed on a
high-dpi
/ retina display.
Other instances where browser scaling is bad are:
- PoS browsers that don't render SVGs (how old are these by now?)
IE up through 8, and Android stock browser through 2.3. Neither are dead yet, so we still gotta deal with rasterization for them.
* Even modern browsers have subpar SVG rendering at 1x, PNG looks better
Examples? Sounds like bugs need to be filed with some of those browsers. :)
- Some media types are "scaled" in unusual ways (SVGs, but also video
stills, PDF pages, ...)
- Some original images are really friggin' large (20-30 megapixels
sometimes), so at least some downscaling is needed there
You'd absolutely want to do server-side downscaling to the base sizes in the appropriate file formats -- we wouldn't try to download multi-megapixel originals just to make a tiny thumbnail, and some formats require conversion to a format the browser can read.
For an example if we were to standardize on sizes: (we wouldn't use these actual sizes because they do NOT fit our usage) * 32px * 64px * 128px * 256px * 512px * 1024px * 2048px
Then somebody requesting a 400px image might get the next size up, the 512px image delivered and scaled down in the browser. (On a high-resolution display, you might fetch the 1024px image.)
In reality we'd want sizes that fit most common usage, and perhaps make future markup & visual editor widgets promote using of standard sizes to minimize the cases where you end up with something that's not an exact fit.
- Mobile clients will want to minimize the amount of data transferred
This is a good reason for picking appropriate default sizes that would fit with actual common usage.
Note that with SVG, SVG originals can be either much smaller or much larger than a rasterized image -- in many cases we have SVGs that are much more detailed than they need to be. So serving of SVG doesn't guarantee a bandwidth save, though it can in well-designed cases.
Mobile also has the case that many (possibly even most in some markets) devices have a greater than 1.0 device-to-CSS pixel ratio, so loading 1.5X or 2.0X versions of raster images may be something we want in many cases. In theory you could make a switch -- just as we have a 'disable images' switch, we could make it a three-way control 'no images - low-resolution images - high-resolution images'.
[And just to screw with people, Windows 8 / Windows RT is going with 1.4x and 1.8x scaling factors instead of the 1.5x and 2.0x that Android and iOS -- and Windows Phone 7 -- use. Fun huh! We're probably not going to have exact scaled versions of them, they'll get the 1.5 or 2.0 and scale it down a little probably.]
-- brion
To revive this old thread...
On Sep 5, 2012, at 9:35 PM, Asher Feldman afeldman@wikimedia.org wrote:
On Tue, Sep 4, 2012 at 3:11 PM, Platonides Platonides@gmail.com wrote:
On 03/09/12 02:59, Tim Starling wrote:
I'll go for option 4. You can't delete the images from the backend while they are still in Squid, because then they would not be purged when the image is updated or action=purge is requested. In fact, that is one of only two reasons for the existence of the backend thumbnail store on Wikimedia. The thumbnail backend could be replaced by a text file that stores a list of thumbnail filenames which were sent to Squid within a window equivalent to the expiry time sent in the Cache-Control header. -- Tim Starling
The second one seems easy to fix. The first one should IMHO be fixed in squid/varnish by allowing wildcard purges (ie. PURGE /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)
fast.ly implements group purge for varnish like this via a proxy daemon that watches backend responses for a "tag" response header (i.e. all resolutions of Tim_starling.jpg would be tagged that) and builds an in-memory hash of tags->objects which can be purged on. I've been told they'd probably open source the code for us if we want it, and it is interesting (especially to deal with the fact that we don't purge articles at all of their possible url's) albeit with its own challenges. If we implemented a backend system to track thumbnails that exist for a given orig, we may be able to remove our dependency on swift container listings to purge images, paving the way for a second class of thumbnails that are only cached.
How about this idea:
Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list.
But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache.
This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure.
Of course this won't work for Squid, but I'm pretty close to being able to replace Squid by Varnish entirely for upload.
On Oct 24, 2012, at 11:36 AM, Mark Bergsma mark@wikimedia.org wrote:
How about this idea:
Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list.
But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache.
This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure.
I checked, and Varnish stores all variant objects in a linked list per hash table entry. So once it looks up the hash entry for the URL of the original, it'll have to do a linear search for the right thumbnail size, matching each against a Vary header string. If we do this, we'll need to restrict the number of variants (thumb sizes) so we don't get hundreds/thousands on a single hash key.
Here's a little proof of concept to demonstrate how it could work:
https://gerrit.wikimedia.org/r/#/c/29805/2
On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling tstarling@wikimedia.org wrote:
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
My understanding is that thumb.php already streamed the thumbnail back to the 404 handler via HTTP and has done so for at least the past two years or so.
Roan
I just wanted to clarify something... is there any protection in place in the thumbnail generator to prevent denial of service attacks? For instance if someone wanted to they could run a script which uploaded photos then fired off requests for thumbnails of it of size 20px,21px,22px...1024px
I'm guessing the servers wouldn't like that. This is why I'd be keen to limit the sizes.
May I suggest someone analyses the sizes currently used on wikipedia and we limit to those as an initial step and then review the less frequently used ones and standardise on some sizes? On Sep 5, 2012 9:15 AM, "Roan Kattouw" roan.kattouw@gmail.com wrote:
On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling tstarling@wikimedia.org wrote:
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
My understanding is that thumb.php already streamed the thumbnail back to the 404 handler via HTTP and has done so for at least the past two years or so.
Roan
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Sep 5, 2012 at 6:40 PM, Jon Robson jdlrobson@gmail.com wrote:
I just wanted to clarify something... is there any protection in place in the thumbnail generator to prevent denial of service attacks? For instance if someone wanted to they could run a script which uploaded photos then fired off requests for thumbnails of it of size 20px,21px,22px...1024px
I'm guessing the servers wouldn't like that. This is why I'd be keen to limit the sizes.
The ability to request an image of whatever size I need is one of my most favorite MediaWiki features. It's very nice to save the extra step of resizing it after downloading it. It makes Commons images all the more easily reusable.
It's quite nice to also be able to have thumbnails of whatever size you want on Wikipedia, overriding the typical size settings.
I'd be fine if we can throttle such requests to prevent DOS and maybe other technical measures to make the feature less abused. But would be sad to see it eliminated.
While it's also nice to hotlink to the images (or via InstantCommons), some expiry on thumbnails might be acceptable.
Cheers, Katie
May I suggest someone analyses the sizes currently used on wikipedia and we limit to those as an initial step and then review the less frequently used ones and standardise on some sizes? On Sep 5, 2012 9:15 AM, "Roan Kattouw" roan.kattouw@gmail.com wrote:
On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling tstarling@wikimedia.org wrote:
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
My understanding is that thumb.php already streamed the thumbnail back to the 404 handler via HTTP and has done so for at least the past two years or so.
Roan
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Is there a bug open for this yet? If not there probably should be... (apologies if there is.. scanning through I cannot see one)
In terms of supporting non-standard files - there is no reason why to get an obscure size e.g. 224px you could get for example the 240px image and resize it with css...
On Tue, Sep 18, 2012 at 7:01 PM, aude aude.wiki@gmail.com wrote:
On Wed, Sep 5, 2012 at 6:40 PM, Jon Robson jdlrobson@gmail.com wrote:
I just wanted to clarify something... is there any protection in place in the thumbnail generator to prevent denial of service attacks? For
instance
if someone wanted to they could run a script which uploaded photos then fired off requests for thumbnails of it of size 20px,21px,22px...1024px
I'm guessing the servers wouldn't like that. This is why I'd be keen to limit the sizes.
The ability to request an image of whatever size I need is one of my most favorite MediaWiki features. It's very nice to save the extra step of resizing it after downloading it. It makes Commons images all the more easily reusable.
It's quite nice to also be able to have thumbnails of whatever size you want on Wikipedia, overriding the typical size settings.
I'd be fine if we can throttle such requests to prevent DOS and maybe other technical measures to make the feature less abused. But would be sad to see it eliminated.
While it's also nice to hotlink to the images (or via InstantCommons), some expiry on thumbnails might be acceptable.
Cheers, Katie
May I suggest someone analyses the sizes currently used on wikipedia and
we
limit to those as an initial step and then review the less frequently
used
ones and standardise on some sizes? On Sep 5, 2012 9:15 AM, "Roan Kattouw" roan.kattouw@gmail.com wrote:
On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling tstarling@wikimedia.org wrote:
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler.
For
that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
My understanding is that thumb.php already streamed the thumbnail back to the 404 handler via HTTP and has done so for at least the past two years or so.
Roan
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Board member, Wikimedia District of Columbia http://wikimediadc.org @wikimediadc / @wikimania2012 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Sep 19, 2012 at 7:02 AM, Jon Robson jdlrobson@gmail.com wrote:
Is there a bug open for this yet? If not there probably should be... (apologies if there is.. scanning through I cannot see one)
In terms of supporting non-standard files - there is no reason why to get an obscure size e.g. 224px you could get for example the 240px image and resize it with css...
1) That adds an unnecessary extra step to reuse images.
2) Not every reuse case involves CSS.
3) Although it's not that super complicated to do in CSS, not every reuser knows CSS.
4) Some browsers do a poor job at rescaling images, although other browsers have improved in this area.
If anything, I think in the download button / dialog in Commons, we should have an option to allow user to choose image of any size to download, in addition to the preset choices. :) The thumbnails can be temporary I suppose, and hope no one uses them to hotlink. (my humble opinion!)
Cheers, Katie
On Tue, Sep 18, 2012 at 7:01 PM, aude aude.wiki@gmail.com wrote:
On Wed, Sep 5, 2012 at 6:40 PM, Jon Robson jdlrobson@gmail.com wrote:
I just wanted to clarify something... is there any protection in place
in
the thumbnail generator to prevent denial of service attacks? For
instance
if someone wanted to they could run a script which uploaded photos then fired off requests for thumbnails of it of size 20px,21px,22px...1024px
I'm guessing the servers wouldn't like that. This is why I'd be keen to limit the sizes.
The ability to request an image of whatever size I need is one of my most favorite MediaWiki features. It's very nice to save the extra step of resizing it after downloading it. It makes Commons images all the more easily reusable.
It's quite nice to also be able to have thumbnails of whatever size you want on Wikipedia, overriding the typical size settings.
I'd be fine if we can throttle such requests to prevent DOS and maybe
other
technical measures to make the feature less abused. But would be sad to
see
it eliminated.
While it's also nice to hotlink to the images (or via InstantCommons),
some
expiry on thumbnails might be acceptable.
Cheers, Katie
May I suggest someone analyses the sizes currently used on wikipedia
and
we
limit to those as an initial step and then review the less frequently
used
ones and standardise on some sizes? On Sep 5, 2012 9:15 AM, "Roan Kattouw" roan.kattouw@gmail.com wrote:
On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling <
tstarling@wikimedia.org>
wrote:
The other reason for the existence of the backend thumbnail store
is
to transport images from the thumbnail scalers to the 404 handler.
For
that purpose, the image only needs to exist in the backend for a
few
seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
My understanding is that thumb.php already streamed the thumbnail
back
to the 404 handler via HTTP and has done so for at least the past two years or so.
Roan
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Board member, Wikimedia District of Columbia http://wikimediadc.org @wikimediadc / @wikimania2012 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Jon Robson http://jonrobson.me.uk @rakugojon _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 19 September 2012 09:20, aude aude.wiki@gmail.com wrote:
If anything, I think in the download button / dialog in Commons, we should have an option to allow user to choose image of any size to download, in addition to the preset choices. :) The thumbnails can be temporary I suppose, and hope no one uses them to hotlink. (my humble opinion!)
Arbitrary-sized thumbnails are a much-used feature, and I don't think this feature should be removed.
The Commons reuse guide [1] notes that hotlinking thumbnails is allowed, but it's a terrible idea and you should either store the image locally or use InstantCommons (which works wonderfully).
As I noted, this thread was started with Ariel warning space was getting low on the image server. Removing much-used functionality when you could just remove unused images still strikes me as a weird response.
- d.
On 19 September 2012 09:25, David Gerard dgerard@gmail.com wrote:
The Commons reuse guide [1] notes that hotlinking thumbnails is allowed, but it's a terrible idea and you should either store the image locally or use InstantCommons (which works wonderfully).
[1] https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia...
- d.
Maybe I'm doing it wrong, but it seems the way to request thumbnails has changed, at least for SVGs.
For example, for http://upload.wikimedia.org/wikipedia/commons/3/3e/Flag_of_New_Zealand.svg
http://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Flag_of_New_Zealand...
used to work, but no longer.
Now, adding .png
http://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Flag_of_New_Zealand...
works.
Is this expected? Can I rely on this going forward?
Thanks, Eric
On Wed, Sep 19, 2012 at 1:29 AM, David Gerard dgerard@gmail.com wrote:
On 19 September 2012 09:25, David Gerard dgerard@gmail.com wrote:
The Commons reuse guide [1] notes that hotlinking thumbnails is allowed, but it's a terrible idea and you should either store the image locally or use InstantCommons (which works wonderfully).
[1] https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia...
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Sep 19, 2012 at 1:20 AM, aude aude.wiki@gmail.com wrote:
On Wed, Sep 19, 2012 at 7:02 AM, Jon Robson jdlrobson@gmail.com wrote:
In terms of supporting non-standard files - there is no reason why to get an obscure size e.g. 224px you could get for example the 240px image and resize it with css...
- That adds an unnecessary extra step to reuse images.
Not necessarily -- if you simply plop the image into an <img src="..." width="..." height="..."> -- and that's what you should usually be doing anyway -- then receiving an image that's not actually the requested size will resize it.
That should actually be what you already get when you request a thumbnail larger than the size of the original.
- Some browsers do a poor job at rescaling images, although other browsers
have improved in this area.
Still true, though most of em are pretty good these days.
If anything, I think in the download button / dialog in Commons, we should have an option to allow user to choose image of any size to download, in addition to the preset choices. :) The thumbnails can be temporary I suppose, and hope no one uses them to hotlink. (my humble opinion!)
I'd usually be content with manually sizing to my perfect dimensions from the original source, probably, but that is a nice shortcut. :)
-- brion
Ariel T. Glenn wrote:
So it's time to have this discussion again. At least, I think we're having it again, though I could not find previous threads on this list about the subject.
In short, scaled media is currently generated on the fly for any size and for any user. The resulting files are kept around forever or until we run perilously short of space, at which point we make some guesses about what we can toss and then do a mass purge. Last time we did so, we had the rotation bug going at the same time, which made for a real fine mess.
A little bit of crunching shows me that we have about 6 million images in use on the projects, and yet we manage to have around 130 million thumbnails. Just for fun I checked to see how many thumbs each image has, what sizes we are looking at, etc. Here's the results.
Only really tangentially related, but I remember thinking when reading this thread: are there any pages (on wikitech.wikimedia.org or elsewhere) that document Wikimedia's current media infrastructure? It's always been a bit of a mystery to me.
MZMcBride
wikitech-l@lists.wikimedia.org