On Oct 24, 2012, at 11:36 AM, Mark Bergsma <mark(a)wikimedia.org> wrote:
How about this idea:
Just "purge all images with this prefix" doesn't really work in Squid or
Varnish, because they don't store their cache database in a format that makes it cheap
to determine which objects would match that. Varnish could do it with their
"bans", but each ban is kept around for a long time, and with the tens,
sometimes hundreds of purges a second we do, this would quickly add up to a massive ban
list.
But... Varnish allows you to customize how it hashes objects into its object hash table
(vcl_hash). What we could do, is hash thumbnails to the same hash key as their original.
Because of our current URL structure, that's pretty much a matter of stripping off the
thumbnail postfix. Then the original and all its associated thumbnails end up at the same
hash key in the hash table, and only a single purge for the original would nuke them all
out of the cache.
This relies on Varnish having an efficient implementation for multiple objects at a
single hash key. It probably does, since it implements Vary processing this way. We would
essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the
implementation to be sure.
I checked, and Varnish stores all variant objects in a linked list per hash table entry.
So once it looks up the hash entry for the URL of the original, it'll have to do a
linear search for the right thumbnail size, matching each against a Vary header string. If
we do this, we'll need to restrict the number of variants (thumb sizes) so we
don't get hundreds/thousands on a single hash key.
Here's a little proof of concept to demonstrate how it could work:
https://gerrit.wikimedia.org/r/#/c/29805/2
--
Mark Bergsma <mark(a)wikimedia.org>
Lead Operations Architect
Wikimedia Foundation