To revive this old thread...
On Sep 5, 2012, at 9:35 PM, Asher Feldman <afeldman(a)wikimedia.org> wrote:
On Tue, Sep 4, 2012 at 3:11 PM, Platonides
<Platonides(a)gmail.com> wrote:
> On 03/09/12 02:59, Tim Starling wrote:
>> I'll go for option 4. You can't delete the images from the backend
>> while they are still in Squid, because then they would not be purged
>> when the image is updated or action=purge is requested. In fact, that
>> is one of only two reasons for the existence of the backend thumbnail
>> store on Wikimedia. The thumbnail backend could be replaced by a text
>> file that stores a list of thumbnail filenames which were sent to
>> Squid within a window equivalent to the expiry time sent in the
>> Cache-Control header.
>> -- Tim Starling
>
> The second one seems easy to fix. The first one should IMHO be fixed in
> squid/varnish by allowing wildcard purges (ie. PURGE
> /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)
fast.ly implements group purge for varnish like this
via a proxy daemon
that watches backend responses for a "tag" response header (i.e. all
resolutions of Tim_starling.jpg would be tagged that) and builds an
in-memory hash of tags->objects which can be purged on. I've been told
they'd probably open source the code for us if we want it, and it is
interesting (especially to deal with the fact that we don't purge articles
at all of their possible url's) albeit with its own challenges. If we
implemented a backend system to track thumbnails that exist for a given
orig, we may be able to remove our dependency on swift container listings
to purge images, paving the way for a second class of thumbnails that are
only cached.
How about this idea:
Just "purge all images with this prefix" doesn't really work in Squid or
Varnish, because they don't store their cache database in a format that makes it cheap
to determine which objects would match that. Varnish could do it with their
"bans", but each ban is kept around for a long time, and with the tens,
sometimes hundreds of purges a second we do, this would quickly add up to a massive ban
list.
But... Varnish allows you to customize how it hashes objects into its object hash table
(vcl_hash). What we could do, is hash thumbnails to the same hash key as their original.
Because of our current URL structure, that's pretty much a matter of stripping off the
thumbnail postfix. Then the original and all its associated thumbnails end up at the same
hash key in the hash table, and only a single purge for the original would nuke them all
out of the cache.
This relies on Varnish having an efficient implementation for multiple objects at a single
hash key. It probably does, since it implements Vary processing this way. We would
essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the
implementation to be sure.
Of course this won't work for Squid, but I'm pretty close to being able to replace
Squid by Varnish entirely for upload.
--
Mark Bergsma <mark(a)wikimedia.org>
Lead Operations Architect
Wikimedia Foundation