This is similar to what I proposed to Ori. For multi-page file (pdf,djvu,tiff) we'd prerender base thumbnails and use them for downscaling in thumb.php on demand. The base thumbnails would only be so large (e.g. not 10000px width) since there isn't much use case for massive thumbnails vs just viewing the original. This would also apply to single page TIFFs, where one reference thumbnail of reasonable size would be used for downscaling on demand. The reference files could be created on upload before the file could even appear at places like Special:NewImages. Resizing the reference files would actually be reasonable to do in thumb.php. File purges could exempt the reference thumbnails themselves (or if it didn't then their generation would be pool countered like TMH does at least). The reference thumbnails should also be in Swift even if we move to CDN "only" thumbnail storage. Disk space is cheap enough for this.
On Thu, Apr 24, 2014 at 8:57 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
On 04/24/2014 06:00 AM, Gilles Dubuc wrote:
Instead of each image scaler server generating a thumbnail immediately
when
a new size is requested, the following would happen in the script
handling
the thumbnail generation request:
It might be helpful to consider this as a fairly generic request limiting / load shedding problem. There are likely simpler and more robust solutions to this using plain Varnish or Nginx, where you basically limit the number of backend connections, and let other requests wait.
Rate limiting without client keys is very limited though. It really only works around the root cause of us allowing clients to start very expensive operations in real time.
A possible way to address the root cause might be to generate screen-sized thumbnails in a standard size ('xxl') in a background process after upload, and then scale all on-demand thumbnails from those. If the base thumb is not yet generated, a placeholder can be displayed and no immediate scaling happens. With the expensive operation of extracting reasonably-sized base thumbs from large originals now happening in a background job, rate limiting becomes easier and won't directly affect the generation of thumbnails of existing images. Creating small thumbs from the smaller base thumb will also be faster than starting from a larger original, and should still yield good quality for typical thumb sizes if the 'xxl' thumb size is large enough.
The disadvantage for multi-page documents would be that we'd create a lot of screen-sized thumbs, some of which might not actually be used. Storage space is relatively cheap though, at least cheaper than service downtime or degraded user experience from normal thumb scale requests being slow.
Gabriel
Ops mailing list Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops