On Tue, Sep 4, 2012 at 3:11 PM, Platonides Platonides@gmail.com wrote:
On 03/09/12 02:59, Tim Starling wrote:
I'll go for option 4. You can't delete the images from the backend while they are still in Squid, because then they would not be purged when the image is updated or action=purge is requested. In fact, that is one of only two reasons for the existence of the backend thumbnail store on Wikimedia. The thumbnail backend could be replaced by a text file that stores a list of thumbnail filenames which were sent to Squid within a window equivalent to the expiry time sent in the Cache-Control header.
The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already.
-- Tim Starling
The second one seems easy to fix. The first one should IMHO be fixed in squid/varnish by allowing wildcard purges (ie. PURGE /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)
fast.ly implements group purge for varnish like this via a proxy daemon that watches backend responses for a "tag" response header (i.e. all resolutions of Tim_starling.jpg would be tagged that) and builds an in-memory hash of tags->objects which can be purged on. I've been told they'd probably open source the code for us if we want it, and it is interesting (especially to deal with the fact that we don't purge articles at all of their possible url's) albeit with its own challenges. If we implemented a backend system to track thumbnails that exist for a given orig, we may be able to remove our dependency on swift container listings to purge images, paving the way for a second class of thumbnails that are only cached.
A wiki with such setup could then disable the on-disk storage.
I think this is entirely doable, but scaling the imagescalers to support cache failures at wmf scale would be a waste, except perhaps for non-standard sizes that aren't widely used. I like Brion's thoughts on revamping image handling, and would like to see semi-permanent (in swift) storage of a standardized set of thumbnail resolutions but we could still support additional resolutions. Browser scaling is also at least worth experimenting with. Instances where browser scaling would be bad are likely instances where the image is already subpar if viewed on a high-dpi / retina display.