On Tue, May 13, 2014 at 4:13 PM, Sumana Harihareswara sumanah@wikimedia.org wrote:
I am trying to figure out how thumbnail retrieval & caching works right now - with Swift, and the frontline & secondary ("frontend" and "backend") Varnishes. (I am working on the caching-related bit of the performance guidelines, and want to understand and help push forward on https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache .) I looked for docs but didn't find anything that had been updated this year.
I was supposed to document this stuff when I first started with the Foundation. Unfortunately I never really got it done. I've got some notes and possibly most helpfully a diagram that I redrew in Omigraffle based on a diagram that Faidon drew on the wall at the office for me one day last fall. I've had this sitting around on my local hard drive for months without uploading it anywhere, so I just threw it up on mw.o [0].
The diagram shows the major components that you described in your summary. Traffic from the internet for http://upload.wikimedia.org/.../some_thumb_url.png hits a front end LVS which routes to a frontend Varnish server. If the URL is not cached locally by that Varnish instance, it will compute a hash of the URL to select the backend Varnish instance that may have the content. If the backend Varnish doesn't have the content it will request the thumbnail from the Swift cluster. This request passes through an LVS that selects a frontend Swift server. The frontend Swift server will handle the request by asking the backend Swift cluster for the desired image. If the image isn't found in the backend cluster, the frontend Swift server will make a request to an image scaler server to have it created. The image scalers run thumb.php from mediawiki/core.git to fetch the original image from swift (which goes back to the same LVS -> Swift frontend -> Swift backend path as the thumb request came down). Once the original image is on the image scaler it will run it through the mime type appropriate scaling software to produce a thumbnail image. I don't remember if at this point the image is stored in Swift by the image scaler via thumb.php's internal logic or if that is handled by the frontend Swift server when it gets the response. In either case, the newly created thumbnail ends up stored in the Swift cluster and is returned as the image scaler's http response to the frontend Swift server handling the original request. The frontend Swift server in turn returns the thumbnail image to the backend Varnish server which will cache it locally and then return the image to the frontend Varnish. Finally the frontend Varnish will cache the image response in local memory and return the image to the original requestor.
The next time this exact thumbnail is requested, it may be found in the frontend Varnish if the LVS routes to the same Varnish and it hasn't been evicted from the in memory cache by time or the need to store something newer. The image will stay in the backend Varnish cache until it ages out based on the response headers or it is evicted to make room for newer content. In the worst case the thumbnail will be found in the Swift cluster where 3 copies of the thumbnail file are stored indefinitely. The only way that the thumbnail will be removed from Swift is when a new version of the source image is uploaded or deleted and a purge request is sent out from the wiki.
[0]: https://www.mediawiki.org/wiki/File:Thumbnail-stack.svg [1]: https://wikitech.wikimedia.org/wiki/Swift/Dev_Notes#Removing_NFS_from_the_sc...
Bryan