On May 13, 2014 7:13 PM, "Sumana Harihareswara" <sumanah(a)wikimedia.org>
wrote:
I am trying to figure out how thumbnail retrieval & caching works right
now - with Swift, and the frontline & secondary ("frontend" and
"backend") Varnishes. (I am working on the caching-related bit of the
performance guidelines, and want to understand and help push forward on
https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
.) I looked for docs but didn't find anything that
had been updated this
year.
Here's how I think it works, assuming you are a MediaWiki developer
who's written, e.g., a page that includes a thumbnail of an image:
First, your code must get the metadata about the image, which might come
from the local database, or memcached, or Commons. Then, you need to get
a thumbnail of the image at the dimensions your page requires. Rather
than create the thumbnail immediately on demand via parsing the filename
and dimensions, Wikimedia's MediaWiki is configured to use the "404
handler." (see [[Manual:Thumb_handler.php]]) Your page first receives a
URL indicating the eventual location of the thumbnail, then the browser
asks for that URL. If it hasn't been created yet, the web server
initially gets an internal 404 error; the 404 handler then kicks off the
thumbnailer to create the thumbnail, and the response gets sent to the
client.
As it is sent to the client, each thumbnail is stored in a Swift store
and stored in our frontline and secondary Varnish caches.
(The Varnish caches cache entire HTTP responses, including thumbnails of
images, frequently-requested pages, ResourceLoader modules, and similar
items that can be retrieved by URL. The frontline Varnishes keep these
in memory. (A weighted-random load balancer (LVS) distributes web
requests to the front-end Varnishes.) But if a frontline Varnish doesn't
have a response cached, it passes the request to the secondary Varnishes
via hash-based load balancing (on the hash of the URL). The secondary
Varnishes hold more responses, storing them ondisk. Every URL is on at
most one secondary Varnish.)
So, at the end of this whole process, any given thumbnail is in:
* the Swift thumbnail store (and will persist until the canonical image
changes, or is deleted, or we run out of space and flush Swift)
* the frontline and secondary Varnishes (and will persist until the
canonical image changes, or is deleted, or we restart the frontline
Varnishes or we evict data from the hard disks of the secondary Varnishes)
Is this right?
--
Sumana Harihareswara
Senior Technical Writer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That is mostly correct afaik. The varnish set up also includes different
caches in different locations (so during invalidation failures you can have
correct data in say usa but not europe, which confuses bug reporters
considerably)
Removal of thumb from storage can also happen by doing ?action=purge on the
image description page. I believe varnish caches only store for a max of 30
days (not 100% sure on that). Swift stores forever.
Im not sure if its in scope of what your trying to document, but htcp
purging is also an important aspect of how our varnish cache works, and a
part that historically has exploded several times.
--bawolff