I am trying to figure out how thumbnail retrieval & caching works right
now - with Swift, and the frontline & secondary ("frontend" and
"backend") Varnishes. (I am working on the caching-related bit of the
performance guidelines, and want to understand and help push forward on
https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
.) I looked for docs but didn't find anything that had been updated this
year.
Here's how I think it works, assuming you are a MediaWiki developer
who's written, e.g., a page that includes a thumbnail of an image:
First, your code must get the metadata about the image, which might come
from the local database, or memcached, or Commons. Then, you need to get
a thumbnail of the image at the dimensions your page requires. Rather
than create the thumbnail immediately on demand via parsing the filename
and dimensions, Wikimedia's MediaWiki is configured to use the "404
handler." (see [[Manual:Thumb_handler.php]]) Your page first receives a
URL indicating the eventual location of the thumbnail, then the browser
asks for that URL. If it hasn't been created yet, the web server
initially gets an internal 404 error; the 404 handler then kicks off the
thumbnailer to create the thumbnail, and the response gets sent to the
client.
As it is sent to the client, each thumbnail is stored in a Swift store
and stored in our frontline and secondary Varnish caches.
(The Varnish caches cache entire HTTP responses, including thumbnails of
images, frequently-requested pages, ResourceLoader modules, and similar
items that can be retrieved by URL. The frontline Varnishes keep these
in memory. (A weighted-random load balancer (LVS) distributes web
requests to the front-end Varnishes.) But if a frontline Varnish doesn't
have a response cached, it passes the request to the secondary Varnishes
via hash-based load balancing (on the hash of the URL). The secondary
Varnishes hold more responses, storing them ondisk. Every URL is on at
most one secondary Varnish.)
So, at the end of this whole process, any given thumbnail is in:
* the Swift thumbnail store (and will persist until the canonical image
changes, or is deleted, or we run out of space and flush Swift)
* the frontline and secondary Varnishes (and will persist until the
canonical image changes, or is deleted, or we restart the frontline
Varnishes or we evict data from the hard disks of the secondary Varnishes)
Is this right?
--
Sumana Harihareswara
Senior Technical Writer
Wikimedia Foundation