It seems like the browser will not always pick up/respect the Cache-Control directive for the browser cache (I don't know why, could be specific to my machine/OS X and I've wasted many hours already trying to figure it out). I've found a workaround, which is using Last-Modified/If-Modified-Since (which will trigger the 304 mechanism) in addition to Cache-Control: https://gerrit.wikimedia.org/r/131425 It's probably worth having that in general anyway, for older browsers.

On Sat, May 3, 2014 at 3:31 PM, Gilles Dubuc <gilles@wikimedia.org> wrote:
I think I might have found a reason for the lack of browser caching of these API calls. Since the expiry isn't "forever", the browser will request the content a second time, looking for a 304 response. When it hits a 304 it will read the body of the response from its cache. Then, after getting a 304, the browser stops hitting the web server altogether for that URL and reads the entire response from its cache.

The issue is that neither Varnish nor PHP on vagrant ever return a 304, it's always a 200 response. As a result, the browser cache is never leveraged for those URLs. I verified this by returning a 304 manually on my vagrant vm.


On Sat, May 3, 2014 at 12:22 PM, Gilles Dubuc <gilles@wikimedia.org> wrote:
I've just found out that Varnish caching of these API calls works, but not browser caching. Which explains the discrepancy you saw on our graphs that didn't lower as much as the servers did: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/566 People are just hitting Varnish instead of the API servers now.


On Sat, May 3, 2014 at 11:18 AM, Gilles Dubuc <gilles@wikimedia.org> wrote:
I vaguely remember some NavTiming/EventLogging work from the Multimedia team, is this correct?

Yes, we've been using the Resource Timing API as well as gathering HTTP headers to determine varnish hits and misses. You can see the global graphs here: http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_performance-graphs-tab and we also have the same graphs on per-wiki dashboards listed here: https://www.mediawiki.org/wiki/Multimedia/Metrics

"imagemiss" is the graph that's the most interesting to you, it tracks varnish misses on thumbnail requests. On the left-hand size of that graph, if you turn off everything except "imagemiss (size)" it shows you that the misses have been declining, while imagehits (varnish hits) have been steady.

Gergo manually rendered a ratio graph a couple of days ago, that shows how much the network effect of all people using Media Viewer has had an impact on the ratio of Varnish misses: http://ur1.ca/h8sa3 We might make an equivalent permanent graph on our dashboard.

I wonder whether there is something wrong with our logging

I don't think that these caching optimizations have been backported:

https://gerrit.wikimedia.org/r/#/c/127459/
https://gerrit.wikimedia.org/r/#/c/127438/

Which means that they've only been deployed to most wikipedias on Thursday.
 
Maybe it wasn't that visible on the graph yesterday, but userinfo looks like it's dropping: https://www.dropbox.com/s/bq7be6m8i0rlbzh/Screenshot%202014-05-03%2010.15.13.png

Also, keep in mind that the Resource Timing data is sampled, server data isn't. The trends are likely to have the same general direction, but slope steepness might not match because of the sampling.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza <gtisza@wikimedia.org> wrote:
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc <gilles@wikimedia.org> wrote:
More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:
The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.



On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza <gtisza@wikimedia.org> wrote:
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc <gilles@wikimedia.org> wrote:
More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:
The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.