On Oct 24, 2012, at 11:36 AM, Mark Bergsma mark@wikimedia.org wrote:
How about this idea:
Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list.
But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache.
This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure.
I checked, and Varnish stores all variant objects in a linked list per hash table entry. So once it looks up the hash entry for the URL of the original, it'll have to do a linear search for the right thumbnail size, matching each against a Vary header string. If we do this, we'll need to restrict the number of variants (thumb sizes) so we don't get hundreds/thousands on a single hash key.
Here's a little proof of concept to demonstrate how it could work:
https://gerrit.wikimedia.org/r/#/c/29805/2