I am trying to figure out how thumbnail retrieval & caching works right now - with Swift, and the frontline & secondary ("frontend" and "backend") Varnishes. (I am working on the caching-related bit of the performance guidelines, and want to understand and help push forward on https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache .) I looked for docs but didn't find anything that had been updated this year.
Here's how I think it works, assuming you are a MediaWiki developer who's written, e.g., a page that includes a thumbnail of an image:
First, your code must get the metadata about the image, which might come from the local database, or memcached, or Commons. Then, you need to get a thumbnail of the image at the dimensions your page requires. Rather than create the thumbnail immediately on demand via parsing the filename and dimensions, Wikimedia's MediaWiki is configured to use the "404 handler." (see [[Manual:Thumb_handler.php]]) Your page first receives a URL indicating the eventual location of the thumbnail, then the browser asks for that URL. If it hasn't been created yet, the web server initially gets an internal 404 error; the 404 handler then kicks off the thumbnailer to create the thumbnail, and the response gets sent to the client.
As it is sent to the client, each thumbnail is stored in a Swift store and stored in our frontline and secondary Varnish caches.
(The Varnish caches cache entire HTTP responses, including thumbnails of images, frequently-requested pages, ResourceLoader modules, and similar items that can be retrieved by URL. The frontline Varnishes keep these in memory. (A weighted-random load balancer (LVS) distributes web requests to the front-end Varnishes.) But if a frontline Varnish doesn't have a response cached, it passes the request to the secondary Varnishes via hash-based load balancing (on the hash of the URL). The secondary Varnishes hold more responses, storing them ondisk. Every URL is on at most one secondary Varnish.)
So, at the end of this whole process, any given thumbnail is in: * the Swift thumbnail store (and will persist until the canonical image changes, or is deleted, or we run out of space and flush Swift) * the frontline and secondary Varnishes (and will persist until the canonical image changes, or is deleted, or we restart the frontline Varnishes or we evict data from the hard disks of the secondary Varnishes)
Is this right?
On May 13, 2014 7:13 PM, "Sumana Harihareswara" sumanah@wikimedia.org wrote:
I am trying to figure out how thumbnail retrieval & caching works right now - with Swift, and the frontline & secondary ("frontend" and "backend") Varnishes. (I am working on the caching-related bit of the performance guidelines, and want to understand and help push forward on
https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
.) I looked for docs but didn't find anything that had been updated this year.
Here's how I think it works, assuming you are a MediaWiki developer who's written, e.g., a page that includes a thumbnail of an image:
First, your code must get the metadata about the image, which might come from the local database, or memcached, or Commons. Then, you need to get a thumbnail of the image at the dimensions your page requires. Rather than create the thumbnail immediately on demand via parsing the filename and dimensions, Wikimedia's MediaWiki is configured to use the "404 handler." (see [[Manual:Thumb_handler.php]]) Your page first receives a URL indicating the eventual location of the thumbnail, then the browser asks for that URL. If it hasn't been created yet, the web server initially gets an internal 404 error; the 404 handler then kicks off the thumbnailer to create the thumbnail, and the response gets sent to the client.
As it is sent to the client, each thumbnail is stored in a Swift store and stored in our frontline and secondary Varnish caches.
(The Varnish caches cache entire HTTP responses, including thumbnails of images, frequently-requested pages, ResourceLoader modules, and similar items that can be retrieved by URL. The frontline Varnishes keep these in memory. (A weighted-random load balancer (LVS) distributes web requests to the front-end Varnishes.) But if a frontline Varnish doesn't have a response cached, it passes the request to the secondary Varnishes via hash-based load balancing (on the hash of the URL). The secondary Varnishes hold more responses, storing them ondisk. Every URL is on at most one secondary Varnish.)
So, at the end of this whole process, any given thumbnail is in:
- the Swift thumbnail store (and will persist until the canonical image
changes, or is deleted, or we run out of space and flush Swift)
- the frontline and secondary Varnishes (and will persist until the
canonical image changes, or is deleted, or we restart the frontline Varnishes or we evict data from the hard disks of the secondary Varnishes)
Is this right?
-- Sumana Harihareswara Senior Technical Writer Wikimedia Foundation
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That is mostly correct afaik. The varnish set up also includes different caches in different locations (so during invalidation failures you can have correct data in say usa but not europe, which confuses bug reporters considerably)
Removal of thumb from storage can also happen by doing ?action=purge on the image description page. I believe varnish caches only store for a max of 30 days (not 100% sure on that). Swift stores forever.
Im not sure if its in scope of what your trying to document, but htcp purging is also an important aspect of how our varnish cache works, and a part that historically has exploded several times.
--bawolff
On 05/13/2014 06:13 PM, Sumana Harihareswara wrote:
I am trying to figure out how thumbnail retrieval & caching works right now - with Swift, and the frontline & secondary ("frontend" and "backend") Varnishes. (I am working on the caching-related bit of the performance guidelines, and want to understand and help push forward on https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache .) I looked for docs but didn't find anything that had been updated this year.
Here's how I think it works, assuming you are a MediaWiki developer who's written, e.g., a page that includes a thumbnail of an image:
My understanding is that the image scaling/storage infrastructure is basically only used for user images (upload.wikimedia.org). Code images generally go on http://bits.wikimedia.org/ and don't use Swift (though sometimes code may refer to a user image).
Matt Flaschen
On Tue, May 13, 2014 at 4:13 PM, Sumana Harihareswara sumanah@wikimedia.org wrote:
I am trying to figure out how thumbnail retrieval & caching works right now - with Swift, and the frontline & secondary ("frontend" and "backend") Varnishes. (I am working on the caching-related bit of the performance guidelines, and want to understand and help push forward on https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache .) I looked for docs but didn't find anything that had been updated this year.
I was supposed to document this stuff when I first started with the Foundation. Unfortunately I never really got it done. I've got some notes and possibly most helpfully a diagram that I redrew in Omigraffle based on a diagram that Faidon drew on the wall at the office for me one day last fall. I've had this sitting around on my local hard drive for months without uploading it anywhere, so I just threw it up on mw.o [0].
The diagram shows the major components that you described in your summary. Traffic from the internet for http://upload.wikimedia.org/.../some_thumb_url.png hits a front end LVS which routes to a frontend Varnish server. If the URL is not cached locally by that Varnish instance, it will compute a hash of the URL to select the backend Varnish instance that may have the content. If the backend Varnish doesn't have the content it will request the thumbnail from the Swift cluster. This request passes through an LVS that selects a frontend Swift server. The frontend Swift server will handle the request by asking the backend Swift cluster for the desired image. If the image isn't found in the backend cluster, the frontend Swift server will make a request to an image scaler server to have it created. The image scalers run thumb.php from mediawiki/core.git to fetch the original image from swift (which goes back to the same LVS -> Swift frontend -> Swift backend path as the thumb request came down). Once the original image is on the image scaler it will run it through the mime type appropriate scaling software to produce a thumbnail image. I don't remember if at this point the image is stored in Swift by the image scaler via thumb.php's internal logic or if that is handled by the frontend Swift server when it gets the response. In either case, the newly created thumbnail ends up stored in the Swift cluster and is returned as the image scaler's http response to the frontend Swift server handling the original request. The frontend Swift server in turn returns the thumbnail image to the backend Varnish server which will cache it locally and then return the image to the frontend Varnish. Finally the frontend Varnish will cache the image response in local memory and return the image to the original requestor.
The next time this exact thumbnail is requested, it may be found in the frontend Varnish if the LVS routes to the same Varnish and it hasn't been evicted from the in memory cache by time or the need to store something newer. The image will stay in the backend Varnish cache until it ages out based on the response headers or it is evicted to make room for newer content. In the worst case the thumbnail will be found in the Swift cluster where 3 copies of the thumbnail file are stored indefinitely. The only way that the thumbnail will be removed from Swift is when a new version of the source image is uploaded or deleted and a purge request is sent out from the wiki.
[0]: https://www.mediawiki.org/wiki/File:Thumbnail-stack.svg [1]: https://wikitech.wikimedia.org/wiki/Swift/Dev_Notes#Removing_NFS_from_the_sc...
Bryan
Bryan's made some updates to the docs! Here's what Bryan just said to me (forwarded with permission):
I put the core of my narrative on wikitech [0] along with the diagram. The page I put that on could use a little more attention to ensure that references to pmtpa are gone. It might be good to remove the discussion of ceph as well since that migration (swift to ceph) was cancelled and isn't likely to start again for a while. I haven't heard anything mentioned about it in the last couple of Core quarterly reviews.
Bryan
Thanks to everyone who clarified this. I hope this helps move https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache forward!
wikitech-l@lists.wikimedia.org