If we make a cron job, could we also have it purge all SVG thumbnails older than say 5 years?
Ryan Kaldari
On Aug 31, 2012, at 5:36 AM, "Ariel T. Glenn" ariel@wikimedia.org wrote:
So it's time to have this discussion again. At least, I think we're having it again, though I could not find previous threads on this list about the subject.
In short, scaled media is currently generated on the fly for any size and for any user. The resulting files are kept around forever or until we run perilously short of space, at which point we make some guesses about what we can toss and then do a mass purge. Last time we did so, we had the rotation bug going at the same time, which made for a real fine mess.
A little bit of crunching shows me that we have about 6 million images in use on the projects, and yet we manage to have around 130 million thumbnails. Just for fun I checked to see how many thumbs each image has, what sizes we are looking at, etc. Here's the results.
Some "standard" sizes are most popular, with between 200K and 640K media files having thumbs scaled to each of these widths: 75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels
But there's plenty of "odd" sizes with lots of thumbs too. For example, over 65K files with width 181px, 20K with width 138px.
As an experiment and before having this data, I purged from ms5 (no longer in use for thumbs) 1/16 of the thumbs that were greater than 100px wide but not one of these widths: 120px, 200px, 220px, 250px, 320px, 640px, 800px We got back over 300GB of space.
The other thing about delivering any scaled version on demand is that we have some media files with several hundred different thumb sizes in there. Here's a few of the top offenders for your entertainment:
2514 wikipedia/commons/thumb/f/f9/Orange_and_cross_section.jpg 2285 wikipedia/commons/thumb/f/fb/Thrermal_grease.jpg 2218 wikipedia/commons/thumb/f/fc/Blue_sport.jpg 2071 wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg 2062 wikipedia/commons/thumb/f/f2/Flag_of_Costa_Rica.svg 2034 wikipedia/commons/thumb/f/f8/Wiktionary-logo-en.svg 1915 wikipedia/commons/thumb/f/f6/VeulesLesRoses.JPG 1689 wikipedia/commons/thumb/f/fa/Wikibooks-logo.svg 1447 wikipedia/commons/thumb/f/fa/Wikiquote-logo.svg 1371 wikipedia/commons/thumb/f/f0/Mori_Uncanny_Valley.svg 1249 wikipedia/commons/thumb/f/f5/Grand_prismatic_spring.jpg 1246 wikipedia/commons/thumb/f/f3/Mature.jpg 1191 wikipedia/commons/thumb/f/f7/Kirchdorf_in_Tirol.JPG 1187 wikipedia/commons/thumb/f/f8/Camille_Cabral_pour_les_Trans.JPG 1143 wikipedia/commons/thumb/f/f7/Profanity.svg 1079 wikipedia/commons/thumb/f/f2/HSV_color_solid_cone.png 1040 wikipedia/commons/thumb/f/f2/Carmen_Electra.jpg 1032 wikipedia/commons/thumb/f/f1/Pink_eye.jpg 1001 wikipedia/commons/thumb/f/f6/USNS_Medgar_Evers_announcement.jpg
I'd comment on some of those but I'd be too snarky.
So there are some things we could change:
- We could generate and keep only certain sizes, tossing the rest.
- We could keep *nothing*, scaling all media as required.
- We could have a cron job that was clever about tossing thumbs every
day (not sure how easy it would be to be clever). 4. ??
In any of these cases, the squids will have copies of recently requested scaled media, so we won't be scaling the same file to the same size over and over in a short time frame.
What do folks think about how to proceed?
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l