On May 13, 2014 7:13 PM, "Sumana Harihareswara" sumanah@wikimedia.org wrote:
I am trying to figure out how thumbnail retrieval & caching works right now - with Swift, and the frontline & secondary ("frontend" and "backend") Varnishes. (I am working on the caching-related bit of the performance guidelines, and want to understand and help push forward on
https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
.) I looked for docs but didn't find anything that had been updated this year.
Here's how I think it works, assuming you are a MediaWiki developer who's written, e.g., a page that includes a thumbnail of an image:
First, your code must get the metadata about the image, which might come from the local database, or memcached, or Commons. Then, you need to get a thumbnail of the image at the dimensions your page requires. Rather than create the thumbnail immediately on demand via parsing the filename and dimensions, Wikimedia's MediaWiki is configured to use the "404 handler." (see [[Manual:Thumb_handler.php]]) Your page first receives a URL indicating the eventual location of the thumbnail, then the browser asks for that URL. If it hasn't been created yet, the web server initially gets an internal 404 error; the 404 handler then kicks off the thumbnailer to create the thumbnail, and the response gets sent to the client.
As it is sent to the client, each thumbnail is stored in a Swift store and stored in our frontline and secondary Varnish caches.
(The Varnish caches cache entire HTTP responses, including thumbnails of images, frequently-requested pages, ResourceLoader modules, and similar items that can be retrieved by URL. The frontline Varnishes keep these in memory. (A weighted-random load balancer (LVS) distributes web requests to the front-end Varnishes.) But if a frontline Varnish doesn't have a response cached, it passes the request to the secondary Varnishes via hash-based load balancing (on the hash of the URL). The secondary Varnishes hold more responses, storing them ondisk. Every URL is on at most one secondary Varnish.)
So, at the end of this whole process, any given thumbnail is in:
- the Swift thumbnail store (and will persist until the canonical image
changes, or is deleted, or we run out of space and flush Swift)
- the frontline and secondary Varnishes (and will persist until the
canonical image changes, or is deleted, or we restart the frontline Varnishes or we evict data from the hard disks of the secondary Varnishes)
Is this right?
-- Sumana Harihareswara Senior Technical Writer Wikimedia Foundation
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That is mostly correct afaik. The varnish set up also includes different caches in different locations (so during invalidation failures you can have correct data in say usa but not europe, which confuses bug reporters considerably)
Removal of thumb from storage can also happen by doing ?action=purge on the image description page. I believe varnish caches only store for a max of 30 days (not 100% sure on that). Swift stores forever.
Im not sure if its in scope of what your trying to document, but htcp purging is also an important aspect of how our varnish cache works, and a part that historically has exploded several times.
--bawolff