On Tue, Sep 4, 2012 at 3:11 PM, Platonides <Platonides(a)gmail.com> wrote:
On 03/09/12 02:59, Tim Starling wrote:
I'll go for option 4. You can't delete
the images from the backend
while they are still in Squid, because then they would not be purged
when the image is updated or action=purge is requested. In fact, that
is one of only two reasons for the existence of the backend thumbnail
store on Wikimedia. The thumbnail backend could be replaced by a text
file that stores a list of thumbnail filenames which were sent to
Squid within a window equivalent to the expiry time sent in the
Cache-Control header.
The other reason for the existence of the backend thumbnail store is
to transport images from the thumbnail scalers to the 404 handler. For
that purpose, the image only needs to exist in the backend for a few
seconds. It could be replaced by a better 404 handler, that sends
thumbnails directly by HTTP. Maybe the Swift one does that already.
-- Tim Starling
The second one seems easy to fix. The first one should IMHO be fixed in
squid/varnish by allowing wildcard purges (ie. PURGE
/wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)
fast.ly implements group purge for varnish like this via a proxy daemon
that watches backend responses for a "tag" response header (i.e. all
resolutions of Tim_starling.jpg would be tagged that) and builds an
in-memory hash of tags->objects which can be purged on. I've been told
they'd probably open source the code for us if we want it, and it is
interesting (especially to deal with the fact that we don't purge articles
at all of their possible url's) albeit with its own challenges. If we
implemented a backend system to track thumbnails that exist for a given
orig, we may be able to remove our dependency on swift container listings
to purge images, paving the way for a second class of thumbnails that are
only cached.
A wiki with such setup could then disable the on-disk storage.
I think this is entirely doable, but scaling the imagescalers to support
cache failures at wmf scale would be a waste, except perhaps for
non-standard sizes that aren't widely used. I like Brion's thoughts on
revamping image handling, and would like to see semi-permanent (in swift)
storage of a standardized set of thumbnail resolutions but we could still
support additional resolutions. Browser scaling is also at least worth
experimenting with. Instances where browser scaling would be bad are
likely instances where the image is already subpar if viewed on a high-dpi
/ retina display.