​Thanks for digging into this, Gilles! It's great that we are becoming more data-driven on this.
IMO the next step should be differentiating between Varnish delay and network delay. event_response should tell us the time between first and last response byte; so Swift delay will be in event_request and network delay in event_response (or so I think, but that should be verified).
A quick first attempt:
mysql:research@analytics-store.eqiad.wmnet [log]> select sum(event_request)/sum(event_total) request, sum(event_response)/sum(event_total) response, count(*) count from MultimediaViewerNetworkPerformance_10774577 where event_type ='image' and event_response is not null and event_XCache not like '%hit%' and event_total > 5000 and date(timestamp(timestamp)) = current_date()\G
*************************** 1. row ***************************
request: 0.0800
response: 0.7390
count: 536
suggests that most of the time is network delay between Varnish and the browser so it is more useful to think about CDNs than about caching strategies.
(Also I wonder what the missing 20% is. Seems a bit high for DNS / TCP handshakes.)