Thanks, Gilles, this is great research!
The lack of effectiveness of the pre-rendering and the negative impact of Varnish misses seem like really important discoveries, and can inform our plans for improving Media Viewer performance over time.
If 17% of image requests in Media Viewer are indeed Varnish misses, it would make sense to investigate a practical solution for this issue. We still get user feedback that some images take a long time to load, and the ~8-second average load time for the 95th percentile (1) seems to confirm that performance can be sluggish for many of our users.
Do we know what it would take to increase the expiry value in Varnish and/or increase Varnish capacity? What kind of costs would we be looking at to improve performance a notch? Is this largely a matter of getting more machines to keep more images in Varnish?
As discussed, I would also be happy to reach out to our friends at Flickr to ask them how they address these issues, as a large dedicated photo sharing site -- and to see if our metrics match their own. It would be good to compare notes with them anyway, if they’re interested (we could even arrange a quick visit to their office, which is only a couple blocks from us, while you’re in town).
Thanks again for this invaluable work on an important issue — it is much appreciated!
Fabrice
(1) https://docs.google.com/a/wikimedia.org/presentation/d/19coD7h2guEpiuCjR17f0... https://docs.google.com/a/wikimedia.org/presentation/d/19coD7h2guEpiuCjR17f0TogguPUvMghHFrA_u0C_yEk/edit#slide=id.g53b8901a1_8_45
On Jan 9, 2015, at 1:46 AM, Gilles Dubuc gilles@wikimedia.org wrote:
Hi everyone,
I recently looked very closely at client-gathered statistics about image serving performance from within Media Viewer. Looking more specifically at the effect of thumbnail pre-rendering at upload time (which has been live for a few months) and thumbnail chaining (which was live for a few weeks and has now been turned off). The main question I was looking to answer is whether either of those techniques improved performance as experienced by users.
Chaining, when combined with pre-rendering, had no noticeable effect on performance experienced by viewers. This is logical because pre-rendering means that the thumbnail generating gains only happen at upload time, therefore clients requesting the image later won't be affected. As for the effect on image scalers load, it was so insignificant that it couldn't be measured. Chaining is probably still useful for people requesting non-standard thumbnail sizes, which I'm not measuring since I've only been looking at Media Viewer, but the priority of addressing the community concerns over JPG sharpening in order to redeploy chaining seems much lower to me now if that's the only use case chaining will be useful for.
The big discovery in my research is that we set out to do pre-rendering based on a wrong assumption. When looking at performance statistics earlier last year, we clearly saw that Varnish misses performed a lot worse than Varnish hits (well, duh) and so we set out to deploy pre-rendering the thumbnail sizes Media Viewer needs in order to get drasticaly reduce the amount of the Varnish misses. The reduction didn't happen.
The wrong assumption was that each varnish miss is a case where the thumbnail requested has to be generated on the fly by the backend. The data I've just discovered shows that this is very rare for the thumbnail sizes Media Viewer currently uses. The vast majority of Varnish misses merely pulls from Swift a thumbnail that has already been rendered at some earlier point in time and just happens to not have been requested for a while. And that Swift pull + Varnish re-add is what's making the majority of Varnish misses perform worse than hits, not the need to generate the thumbnail with ImageMagick. The bottom line is that the thumbnail prerendering provided insignificant performance gains for this set of sizes. Infrequently requested thumbnails is the main problem, not the fact that they are rendered on the fly the first time they are requested.
It seems like the only way to increase image serving performance in our current setup is to increase the expiry value in Varnish and/or increase Varnish capacity. Right now 17% of image requests in Media Viewer are Varnish misses, and 99.5% of those are pulling an existing thumbnail from Swift. Varnish misses are twice as slow as hits on average.
I plan to disable pre-rendering next week in order to confirm these findings and determine for certain what percentage of image requests pre-rendering is useful for on the set of sizes Media Viewer currently uses.
If you want to dig into the data, the relevant tables on the analytics DB are MultimediaViewerNetworkPerformance* and more specifically the event_varnish*, event_timestamp and event_lastModified columns.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia