On Fri, 31 Aug 2012 05:36:18 -0700, Ariel T. Glenn <ariel(a)wikimedia.org>
So it's time to have this discussion again. At
least, I think we're
having it again, though I could not find previous threads on this list
about the subject.
In short, scaled media is currently generated on the fly for any size
and for any user. The resulting files are kept around forever or until
we run perilously short of space, at which point we make some guesses
about what we can toss and then do a mass purge. Last time we did so, we
had the rotation bug going at the same time, which made for a real fine
A little bit of crunching shows me that we have about 6 million images
in use on the projects, and yet we manage to have around 130 million
thumbnails. Just for fun I checked to see how many thumbs each image
has, what sizes we are looking at, etc. Here's the results.
Some "standard" sizes are most popular, with between 200K and 640K media
files having thumbs scaled to each of these widths:
75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels
But there's plenty of "odd" sizes with lots of thumbs too. For example,
over 65K files with width 181px, 20K with width 138px.
As an experiment and before having this data, I purged from ms5 (no
longer in use for thumbs) 1/16 of the thumbs that were greater than
100px wide but not one of these widths:
120px, 200px, 220px, 250px, 320px, 640px, 800px
We got back over 300GB of space.
The other thing about delivering any scaled version on demand is that we
have some media files with several hundred different thumb sizes in
there. Here's a few of the top offenders for your entertainment:
I'd comment on some of those but I'd be too snarky.
So there are some things we could change:
1. We could generate and keep only certain sizes, tossing the rest.
2. We could keep *nothing*, scaling all media as required.
3. We could have a cron job that was clever about tossing thumbs every
day (not sure how easy it would be to be clever).
In any of these cases, the squids will have copies of recently requested
scaled media, so we won't be scaling the same file to the same size over
and over in a short time frame.
What do folks think about how to proceed?
Another idea I've played with was development of a LRU filesystem.
Probably a FUSE module. You would mount it at thumbs/ and unused files
would periodically disappear.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name