Re: [Multimedia] [Ops] Caching API responses

List overview All Threads
Download

newer

older

The Weekly Multimedia Gazette for...

Localization of File: prefix in...

Gergo Tisza

26 Apr 2014 26 Apr '14

4:32 a.m.

On Fri, Apr 25, 2014 at 6:27 PM, Ori Livneh ori@wikimedia.org wrote:

...

On Thu, Apr 17, 2014 at 1:13 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
When the user opens media viewer, but there are 4 API calls per image and we preload the next/previous images fairly quickly after opening one. So generally within a few seconds, you're looking at 12 API calls when opening Media Viewer.

That's way too high. Since you're planning to deploy this soon, we should figure out how to meet the requirements using the infrastructure that we have rather than the one we'd like to have. Have you considered adding a MediaWiki API module to your extension that composes the data into a single response? You could do this without duplicating code by constructing DerivativeRequest objects to each endpoint, as described in < https://www.mediawiki.org/wiki/API:Calling_internally%3E.

This is the current behavior (in master): - one filerepoinfo API call per page - one imageinfo, imageusage, globalusage API call per image - depending on the language, there might be a users API call per image, possibly to another wiki (Commons). - there might be another imageinfo call to get sizes for a specific thumbnail. This on the file type/size, should be very rare.

All but the imageinfo call are cached on Varnish for one day. (Caching imageinfo for more than a few minutes would be more problematic as users would expect to see changes to image description etc. immediately.) Merging filerepoinfo/imageinfo/imageusage/globalusage into a single API call should be possible even on the client side, but it would mean that we cannot cache anything; not sure how that affects server load (I suppose the API has its own caching mechanism, but even that must have some overhead compared to Varnish). Similarly, merging multiple calls to the same API would be possible but would make caching mostly useless.

The users API call can go to a different wiki, so would be very difficult to merge it directly with the other calls. We only use it to get the gender of the uploader, though; maybe that information could be added to the imageinfo API, which has its own mechanism of handling remote filerepos.

If you absolutely had to cut some functionality out in order to roll this

...

out more broadly, what would you eliminate?

IMO we could get rid of the users, imageusage and globalusage calls without much trouble. The first one is only used for gender-correct translations. The other two are used to show some pages which use the image - since there is not enough place on the UI to show more than a few, this is not a very useful feature as it exists now.

We could also only enable preloading after the user has used prev/next navigation for the first time.

Attachments:

attachment.htm (text/html — 4.0 KB)

Show replies by date

Federico Leva (Nemo)

26 Apr 26 Apr

8:49 a.m.

New subject: [Ops] Caching API responses

Gergo Tisza, 26/04/2014 04:32:

...

If you absolutely had to cut some functionality out in order to roll
this out more broadly, what would you eliminate?
IMO we could get rid of the users, imageusage and globalusage calls without much trouble. The first one is only used for gender-correct translations.

Degrading i18n is never an option.

Nemo

Gergo Tisza

27 Apr 27 Apr

2:17 a.m.

New subject: [Ops] Caching API responses

On Fri, Apr 25, 2014 at 11:49 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:

...

Gergo Tisza, 26/04/2014 04:32:
 If you absolutely had to cut some functionality out in order to roll
...
this out more broadly, what would you eliminate?
IMO we could get rid of the users, imageusage and globalusage calls without much trouble. The first one is only used for gender-correct translations.
Degrading i18n is never an option.

Meh, if we had to choose between disabling MediaViewer and sometimes displaying usernames in the wrong gender, that would be a no-brainer. i18n is important, but not *that* important. (Also, the effect would not be that large - based on a quick grep, only about 12% of the existing MediaViewer localizations even use genders.)

I would prefer putting the genders directly into the imageinfo API, though. Ideally, any API that returns usernames should return genders as well, as they are necessary to display those usernames. Even more so if that API returns usernames from remote wikis whose API might not be public, or usernames from a ForeignDBRepo which might not even have a wiki associated...

Federico Leva (Nemo)

10:49 a.m.

New subject: [Ops] Caching API responses

Gergo Tisza, 27/04/2014 02:17:

...

Meh, if we had to choose between disabling MediaViewer and sometimes displaying usernames in the wrong gender, that would be a no-brainer. i18n is important, but not *that* important.

I disagree: i18n comes first, new features must not be allowed to cause i18n regressions. The question was what things could be dropped in case of emergency and my personal answer is that, if you can't reach a corresponding level of language support (or performance), the actual file descriptions can be shown instead of a custom solution. The custom file information area is just a pageful of stuff outside the screen anyway; loading the page description instead is not a consistent feature regression.

Nemo

...

(Also, the effect would not be that large - based on a quick grep, only about 12% of the existing MediaViewer localizations even use genders.)

I would prefer putting the genders directly into the imageinfo API, though. Ideally, any API that returns usernames should return genders as well, as they are necessary to display those usernames. Even more so if that API returns usernames from remote wikis whose API might not be public, or usernames from a ForeignDBRepo which might not even have a wiki associated...

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Ori Livneh

9:06 p.m.

New subject: [Ops] Caching API responses

On Fri, Apr 25, 2014 at 7:32 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

All but the imageinfo call are cached on Varnish for one day. (Caching imageinfo for more than a few minutes would be more problematic as users would expect to see changes to image description etc. immediately.) Merging filerepoinfo/imageinfo/imageusage/globalusage into a single API call should be possible even on the client side, but it would mean that we cannot cache anything; not sure how that affects server load (I suppose the API has its own caching mechanism, but even that must have some overhead compared to Varnish). Similarly, merging multiple calls to the same API would be possible but would make caching mostly useless.

Making 12 API calls means you run into browser connection limits[1]. The overhead for each discrete request is substantial as well. I strongly suspect that combining the calls would substantially improve real-world user experience, but rather than relying on hunches I'd really like to see a controlled experiment that compares the two approaches. Would it be difficult to make MMV's code choose one or the other approach at runtime?

[1]: http://www.browserscope.org/?category=network

Gergo Tisza

28 Apr 28 Apr

8:56 a.m.

New subject: [Ops] Caching API responses

On Sun, Apr 27, 2014 at 12:06 PM, Ori Livneh ori@wikimedia.org wrote:

...

Making 12 API calls means you run into browser connection limits[1].

We make 3 to 5 requests per image depending on circumstances (3 should be much more common). Image data loading is queued, so we fire of the first batch of requests for the current image, wait until all of those have been finished, fire the requests for the next image, wait again, fire the requests for the image preceding the current one. Thus the connection limits are not exceeded (on modern browsers anyway).

...

The overhead for each discrete request is substantial as well. I strongly suspect that combining the calls would substantially improve real-world user experience, but rather than relying on hunches I'd really like to see a controlled experiment that compares the two approaches. Would it be difficult to make MMV's code choose one or the other approach at runtime?

It should be fairly easy as long as we are talking about merging all the API requests for the same image. Merging all the requests for all images (i.e. loading all the data for 3 images in a single request) would be more complicated, but it is probably less useful for comparison anyway - as I said, the current code does not make all those requests in parallel.

While we are speaking about controlled experiments, would it be possible to perform some sort of load test with some sort of script generating API requests, instead of waiting until we find out server-side performance problems the hard way? Changing the pattern of requests in a text file is much easier than doing it in live JS code that actually relies on those requests; and while it would not answer questions about client-side performance effect, it would be probably more useful for estimating effects on server load than any live experiment we could be doing with MediaViewer in its current, relatively low-traffic state. We might want to do a similar test with image requests as well, to check the load on scalers, given that MediaViewer is requesting sizes that were typically not used before.

Gilles Dubuc

noon

New subject: [Ops] Caching API responses

...

Merging filerepoinfo/imageinfo/imageusage/globalusage into a single API call should be possible even on the client side, but it would mean that we cannot cache anything

How about merging the cacheable ones (filerepoinfo/imageusage/globalusage) into a single client call? Wouldn't the server come back with proper caching headers? If not, we can combine it into a meta server-side call with caching turned on, right?. If they're aggregated on the client side, since Media Viewer would always construct the request string the same way, it would cache across users.

This would make the number of requests per image go from 4 to 2.

Failing that, I'm in favor of dropping imageusage/globalusage, which have only very limited usefulness in the way they're currently implemented, which is a kind of placeholder until proper search results can be used.

On Sat, Apr 26, 2014 at 4:32 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Fri, Apr 25, 2014 at 6:27 PM, Ori Livneh ori@wikimedia.org wrote:

...
On Thu, Apr 17, 2014 at 1:13 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
When the user opens media viewer, but there are 4 API calls per image and we preload the next/previous images fairly quickly after opening one. So generally within a few seconds, you're looking at 12 API calls when opening Media Viewer.

That's way too high. Since you're planning to deploy this soon, we should figure out how to meet the requirements using the infrastructure that we have rather than the one we'd like to have. Have you considered adding a MediaWiki API module to your extension that composes the data into a single response? You could do this without duplicating code by constructing DerivativeRequest objects to each endpoint, as described in < https://www.mediawiki.org/wiki/API:Calling_internally%3E.

This is the current behavior (in master):

one filerepoinfo API call per page

one imageinfo, imageusage, globalusage API call per image

depending on the language, there might be a users API call per image,

possibly to another wiki (Commons).

there might be another imageinfo call to get sizes for a specific

thumbnail. This on the file type/size, should be very rare.

All but the imageinfo call are cached on Varnish for one day. (Caching imageinfo for more than a few minutes would be more problematic as users would expect to see changes to image description etc. immediately.) Merging filerepoinfo/imageinfo/imageusage/globalusage into a single API call should be possible even on the client side, but it would mean that we cannot cache anything; not sure how that affects server load (I suppose the API has its own caching mechanism, but even that must have some overhead compared to Varnish). Similarly, merging multiple calls to the same API would be possible but would make caching mostly useless.

The users API call can go to a different wiki, so would be very difficult to merge it directly with the other calls. We only use it to get the gender of the uploader, though; maybe that information could be added to the imageinfo API, which has its own mechanism of handling remote filerepos.

If you absolutely had to cut some functionality out in order to roll this

...
out more broadly, what would you eliminate?

IMO we could get rid of the users, imageusage and globalusage calls without much trouble. The first one is only used for gender-correct translations. The other two are used to show some pages which use the image

since there is not enough place on the UI to show more than a few, this

is not a very useful feature as it exists now.

We could also only enable preloading after the user has used prev/next navigation for the first time.

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Gergo Tisza

11:38 p.m.

New subject: [Ops] Caching API responses

On Mon, Apr 28, 2014 at 3:00 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

How about merging the cacheable ones (filerepoinfo/imageusage/globalusage) into a single client call?

imageusage/globalusage are the best candidates for a merge, we use them in a fully identical way. Not so sure about filerepoinfo, which is fired once per page (not once per image) and is always the same for a given wiki, so every client would only request it once per cache lifetime, and it would never be a Varnish miss. Adding it to every single per-image API request would just bloat the amount of data that has to be transferred, without any significant gain. But as Ori says, we might want to test those assumptions instead of just making them - we should consider how much effort it takes to set up multiple versions (everything separate, imageusage+globalusage merged, imageusage+globalusage+filerepoinfo merged, all merged) and compare them. (Writing the code for it should be pretty simple since the current response parsing code would keep working; not so sure about the deployment/measurement parts.)

Wouldn't the server come back with proper caching headers?

...

Yes, if every component of an API call can be cached in Varnish, then the whole thing can be cached as well.

If they're aggregated on the client side, since Media Viewer would always

...

construct the request string the same way, it would cache across users.

It would, although I wouldn't be absolutely sure of MediaViewer always constructing the same request. We pass the query parameters to mw.Api (and eventually $.ajax) in a JS object literal, which has no well-defined ordering, and Chrome for example is known to iterate object keys differently from other browsers. Although I think that only happens if you have numeric keys, so yeah, the URL will probably be stable.

...

This would make the number of requests per image go from 4 to 2.

2-ish, yes. (There is still the users request, which is cross-wiki and would be very hard to merge because of that, but we generate gendered messages (or message; actually there is just one at the moment) with all possible genders and only do an API request for the gender if there is a difference. Only about 10% of languages make use of gender, and most of them are smaller ones (ru and pl are the largest), so the amount of users API requests should be negligible.)

Faidon Liambotis

4:08 p.m.

New subject: [Ops] Caching API responses

On Fri, Apr 25, 2014 at 07:32:50PM -0700, Gergo Tisza wrote:

...

Merging filerepoinfo/imageinfo/imageusage/globalusage into a single API call should be possible even on the client side, but it would mean that we cannot cache anything; not sure how that affects server load (I suppose the API has its own caching mechanism, but even that must have some overhead compared to Varnish). Similarly, merging multiple calls to the same API would be possible but would make caching mostly useless.

Varnish & HTTP caching isn't employed just to reduce appserver load. In contrast to appservers, we have Varnishes deployed in all of our different locations (currently esams & ulsfo to serve Europe/Africa & North America west coast/Asia, respectively), essentially serving as our own CDN.

The difference in RTT has a multiplied effect on the total page load time. As Analytics can tell you from our recent experience with the deployment of ulsfo, this reduction in RTT can have a tremendous effect in the user experience. While backend caching (memcache), appserver speedups (HHVM) and other protocols (SPDY) can help, CDNs are ultimately the only way you can beat the speed of light.

If the content is more-or-less static (can be invalidated by either a TTL or explicit purges on content changes) and isolated, caching at the HTTP layer should be preferred.

Regards, Faidon

Gergo Tisza

29 Apr 29 Apr

12:01 a.m.

New subject: [Ops] Caching API responses

On Mon, Apr 28, 2014 at 7:08 AM, Faidon Liambotis faidon@wikimedia.orgwrote:

...

If the content is more-or-less static (can be invalidated by either a TTL or explicit purges on content changes) and isolated, caching at the HTTP layer should be preferred.

Agreed. Of the requests we make, filerepoinfo and users essentially never change, imageusage and globalusage we can pretend to be static since we don't care about small inaccuracies; the problematic one is imageinfo.

Part of imageinfo is parsed from templates on the file description change, assuming those templates add the right markup to annotate the data they contain. We want to get communities to make their local templates behave similarly to the ones on Commons, so they can also be parsed; this is important for both MediaViewer and for eventually moving image metadata into Wikibase. This means editors will need to tweak a lot of templates and verify that the data is parsed correctly; if between the tweaking and the verification there is a one-day caching period, that would kill all such efforts.

I guess if either server load or roundtrip lag becomes a big issue, we could write some sort of separate gadget which editors could use to verify the API results, while MediaViewer could use caching, but that should be a last resort.

As for explicit purges, that seems to be a nasty business for API queries. Varnish supports ID-based invalidation, but their docs warn [1] that it does not scale well. The more scalable tag-based invalidation (hashtwo) is in the proprietary part of Varnish. URL-based purges would require reconstructing the exact same URL as the client made, including parameter ordering, pagename encoding flavors, maxage parameter etc.; not hard to do but a pain to maintain. What's even worse, for Commons images, an edit or reupload would mean purging API URLs across hundreds of wikis. So I don't think explicit invalidation would be doable.

[1] https://www.varnish-software.com/blog/advanced-cache-invalidation-strategies

Gergo Tisza

12:16 a.m.

New subject: [Ops] Caching API responses

On Mon, Apr 28, 2014 at 3:01 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

Part of imageinfo is parsed from templates on the file description change, assuming those templates add the right markup to annotate the data they contain. We want to get communities to make their local templates behave similarly to the ones on Commons, so they can also be parsed; this is important for both MediaViewer and for eventually moving image metadata into Wikibase. This means editors will need to tweak a lot of templates and verify that the data is parsed correctly; if between the tweaking and the verification there is a one-day caching period, that would kill all such efforts.

I guess if either server load or roundtrip lag becomes a big issue, we could write some sort of separate gadget which editors could use to verify the API results, while MediaViewer could use caching, but that should be a last resort.

On second thought, the important part of imageinfo (iiprop=extmetadata) is currently cached for 12 hours without invalidation for remote files. While that is something that should eventually be fixed, for now we wouldn't lose anything by caching imageinfo requests to non-local files for a few hours; that should cover the large majority of requests. Also, it's the local templates that need community effort, so the caching of metadat of remote files wouldn't have much effect on that.

Max Semenik

12:51 a.m.

New subject: [Ops] Caching API responses

On Mon, Apr 28, 2014 at 3:01 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

Agreed. Of the requests we make, filerepoinfo and users essentially never change, imageusage and globalusage we can pretend to be static since we don't care about small inaccuracies; the problematic one is imageinfo.

You could just load the needed parts of filerepoinfo via ResourceLoader.

Gilles Dubuc

2 May 2 May

11:38 a.m.

New subject: [Ops] Caching API responses

I've dug up graphs for these APIs:

- globalusage: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

The effect of the caching deployed on the 24th ( https://gerrit.wikimedia.org/r/#/c/127438/) is striking on this one. It seems like the spike caused by the launch to nl & fr wikipedias last night is reasonable and subsided very quickly.

- imageusage: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

same story as globalusage

- userinfo https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

-filerepoinfo: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

- imageinfo: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This is the one that we can't cache at the moment. It looks quite stable through the nl/fr launch, though. We might have to wait a few days to be sure but it doesn't look like a noticeable increase.

Are these the right graphs to look at to see if these APIs aren't going nuts and won't take down the servers when we release to bigger wikis?

On a related note, is this the right dashboard for API servers? http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=cpu_repor...

I'm trying to assess the danger of launching to bigger wikis: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/523 and at this point it doesn't look like API requests are worrying. It would be great if someone from ops could confirm that I looked at the right things and whether or not there are signs that are worrying in there that I didn't see.

I'll also be looking at image scaler stats separately, but I wanted to bring this up in this discussion, since API request caching or lack thereof was a concern to a lot of people. I'm searching for any data that could confirm whether or not we're doing enough in preparation for the bigger deployments of Media Viewer.

On Tue, Apr 29, 2014 at 12:51 AM, Max Semenik msemenik@wikimedia.orgwrote:

...

On Mon, Apr 28, 2014 at 3:01 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...
Agreed. Of the requests we make, filerepoinfo and users essentially never change, imageusage and globalusage we can pretend to be static since we don't care about small inaccuracies; the problematic one is imageinfo.

You could just load the needed parts of filerepoinfo via ResourceLoader.

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Faidon Liambotis

4:26 p.m.

New subject: [Ops] Caching API responses

Hi Gilles,

Thanks for digging up all these graphs. This is thorough work and truly excellent preparation, kudos!

I agree that we seem to be doing okay so far, indeed.

On Fri, May 02, 2014 at 11:38:29AM +0200, Gilles Dubuc wrote:

...

Are these the right graphs to look at to see if these APIs aren't going nuts and won't take down the servers when we release to bigger wikis?

On a related note, is this the right dashboard for API servers? http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=cpu_repor...

Yes, these are the right graphs and the Ganglia cluster "API Application servers eqiad" is the one to monitor indeed. From that group, the most interesting metrics would be the ap_rps (Apache Requests per Second) and ap_busy_workers: http://ganglia.wikimedia.org/latest/stacked.php?m=ap_rps&c=API%20applica...

API is being served from the main Varnish clusters ("Text caches eqiad/esams/ulsfo"), so you wouldn't have a separate group to monitor there and the data will incorporate a lot of noise. The frontend.client_req and varnish.client_req metrics would be the ones to monitor there.

Also, considering the nature of the feature and the need for newly generated thumbs (AIUI) we should watch carefully: a) Swift, in particular rps, b) Imagescalers, in particular rps, c) Front/back Upload Varnishes. All these are at Ganglia's Media Storage view: https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&v...

Finally, this falls a bit outside of ops, but it ties closely to the discussion about cached API responses, as it involves the (lack of) CDN for these requests: we should assess the effect that the feature has on frontend metrics, NavigationTiming such. Gdash has a dashboard with some high-level graphs for that that I don't think are going to be very useful.My understanding is that you were also doing some work in this area already, though? I vaguely remember some NavTiming/EventLogging work from the Multimedia team, is this correct?

Thanks, Faidon

Gergo Tisza

10:04 p.m.

New subject: [Ops] Caching API responses

On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging: http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

Gilles Dubuc

3 May 3 May

11:18 a.m.

New subject: [Ops] Caching API responses

...

I vaguely remember some NavTiming/EventLogging work from the Multimedia team, is this correct?

Yes, we've been using the Resource Timing API as well as gathering HTTP headers to determine varnish hits and misses. You can see the global graphs here: http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform... we also have the same graphs on per-wiki dashboards listed here: https://www.mediawiki.org/wiki/Multimedia/Metrics

"imagemiss" is the graph that's the most interesting to you, it tracks varnish misses on thumbnail requests. On the left-hand size of that graph, if you turn off everything except "imagemiss (size)" it shows you that the misses have been declining, while imagehits (varnish hits) have been steady.

Gergo manually rendered a ratio graph a couple of days ago, that shows how much the network effect of all people using Media Viewer has had an impact on the ratio of Varnish misses: http://ur1.ca/h8sa3 https://chart.googleapis.com/chart?cht=lc&chs=600x400&chds=0,1&chxt=x,y&chxr=1,0,100%7C0,1,31,1&chxl=1:Apr+1%7C11%7C21%7CMay+1&chd=t:0.7744107744,0.7146892655,0.6157407407,0.6258234519,0.6097087379,0.6268939394,0.6531007752,0.662027833,0.5877192982,0.6180371353,0.6314102564,0.6047297297,0.5735849057,0.618705036,0.5930232558,0.4857612267,0.3755687784,0.2952091255,0.2792185921,0.3090937403,0.2896433741,0.2661846309,0.2562147829,0.251244208,0.2177919249,0.232136633,0.2255857954,0.2392075695,0.2198016295,0.2327399767,0.2292069632We might make an equivalent permanent graph on our dashboard.

I wonder whether there is something wrong with our logging

...

I don't think that these caching optimizations have been backported:

https://gerrit.wikimedia.org/r/#/c/127459/ https://gerrit.wikimedia.org/r/#/c/127438/

Which means that they've only been deployed to most wikipedias on Thursday.

Maybe it wasn't that visible on the graph yesterday, but userinfo looks like it's dropping: https://www.dropbox.com/s/bq7be6m8i0rlbzh/Screenshot%202014-05-03%2010.15.13...

Also, keep in mind that the Resource Timing data is sampled, server data isn't. The trends are likely to have the same general direction, but slope steepness might not match because of the sampling.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

Gilles Dubuc

12:22 p.m.

New subject: [Ops] Caching API responses

I've just found out that Varnish caching of these API calls works, but not browser caching. Which explains the discrepancy you saw on our graphs that didn't lower as much as the servers did: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/566Peopl... are just hitting Varnish instead of the API servers now.

On Sat, May 3, 2014 at 11:18 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

I vaguely remember some NavTiming/EventLogging work from the Multimedia

...
team, is this correct?

Yes, we've been using the Resource Timing API as well as gathering HTTP headers to determine varnish hits and misses. You can see the global graphs here: http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform... we also have the same graphs on per-wiki dashboards listed here: https://www.mediawiki.org/wiki/Multimedia/Metrics

"imagemiss" is the graph that's the most interesting to you, it tracks varnish misses on thumbnail requests. On the left-hand size of that graph, if you turn off everything except "imagemiss (size)" it shows you that the misses have been declining, while imagehits (varnish hits) have been steady.

Gergo manually rendered a ratio graph a couple of days ago, that shows how much the network effect of all people using Media Viewer has had an impact on the ratio of Varnish misses: http://ur1.ca/h8sa3 https://chart.googleapis.com/chart?cht=lc&chs=600x400&chds=0,1&chxt=x,y&chxr=1,0,100%7C0,1,31,1&chxl=1:Apr+1%7C11%7C21%7CMay+1&chd=t:0.7744107744,0.7146892655,0.6157407407,0.6258234519,0.6097087379,0.6268939394,0.6531007752,0.662027833,0.5877192982,0.6180371353,0.6314102564,0.6047297297,0.5735849057,0.618705036,0.5930232558,0.4857612267,0.3755687784,0.2952091255,0.2792185921,0.3090937403,0.2896433741,0.2661846309,0.2562147829,0.251244208,0.2177919249,0.232136633,0.2255857954,0.2392075695,0.2198016295,0.2327399767,0.2292069632We might make an equivalent permanent graph on our dashboard.

I wonder whether there is something wrong with our logging

...
I don't think that these caching optimizations have been backported:

https://gerrit.wikimedia.org/r/#/c/127459/ https://gerrit.wikimedia.org/r/#/c/127438/

Which means that they've only been deployed to most wikipedias on Thursday.

Maybe it wasn't that visible on the graph yesterday, but userinfo looks like it's dropping: https://www.dropbox.com/s/bq7be6m8i0rlbzh/Screenshot%202014-05-03%2010.15.13...

Also, keep in mind that the Resource Timing data is sampled, server data isn't. The trends are likely to have the same general direction, but slope steepness might not match because of the sampling.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

Gilles Dubuc

3:31 p.m.

New subject: [Ops] Caching API responses

I think I might have found a reason for the lack of browser caching of these API calls. Since the expiry isn't "forever", the browser will request the content a second time, looking for a 304 response. When it hits a 304 it will read the body of the response from its cache. Then, after getting a 304, the browser stops hitting the web server altogether for that URL and reads the entire response from its cache.

The issue is that neither Varnish nor PHP on vagrant ever return a 304, it's always a 200 response. As a result, the browser cache is never leveraged for those URLs. I verified this by returning a 304 manually on my vagrant vm.

On Sat, May 3, 2014 at 12:22 PM, Gilles Dubuc gilles@wikimedia.org wrote:

...

I've just found out that Varnish caching of these API calls works, but not browser caching. Which explains the discrepancy you saw on our graphs that didn't lower as much as the servers did: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/566Peopl... are just hitting Varnish instead of the API servers now.

On Sat, May 3, 2014 at 11:18 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
I vaguely remember some NavTiming/EventLogging work from the Multimedia

...
team, is this correct?

Yes, we've been using the Resource Timing API as well as gathering HTTP headers to determine varnish hits and misses. You can see the global graphs here: http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform... we also have the same graphs on per-wiki dashboards listed here: https://www.mediawiki.org/wiki/Multimedia/Metrics

"imagemiss" is the graph that's the most interesting to you, it tracks varnish misses on thumbnail requests. On the left-hand size of that graph, if you turn off everything except "imagemiss (size)" it shows you that the misses have been declining, while imagehits (varnish hits) have been steady.

Gergo manually rendered a ratio graph a couple of days ago, that shows how much the network effect of all people using Media Viewer has had an impact on the ratio of Varnish misses: http://ur1.ca/h8sa3 https://chart.googleapis.com/chart?cht=lc&chs=600x400&chds=0,1&chxt=x,y&chxr=1,0,100%7C0,1,31,1&chxl=1:Apr+1%7C11%7C21%7CMay+1&chd=t:0.7744107744,0.7146892655,0.6157407407,0.6258234519,0.6097087379,0.6268939394,0.6531007752,0.662027833,0.5877192982,0.6180371353,0.6314102564,0.6047297297,0.5735849057,0.618705036,0.5930232558,0.4857612267,0.3755687784,0.2952091255,0.2792185921,0.3090937403,0.2896433741,0.2661846309,0.2562147829,0.251244208,0.2177919249,0.232136633,0.2255857954,0.2392075695,0.2198016295,0.2327399767,0.2292069632We might make an equivalent permanent graph on our dashboard.

I wonder whether there is something wrong with our logging

...
I don't think that these caching optimizations have been backported:

https://gerrit.wikimedia.org/r/#/c/127459/ https://gerrit.wikimedia.org/r/#/c/127438/

Which means that they've only been deployed to most wikipedias on Thursday.

Maybe it wasn't that visible on the graph yesterday, but userinfo looks like it's dropping: https://www.dropbox.com/s/bq7be6m8i0rlbzh/Screenshot%202014-05-03%2010.15.13...

Also, keep in mind that the Resource Timing data is sampled, server data isn't. The trends are likely to have the same general direction, but slope steepness might not match because of the sampling.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.orgwrote:

...
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.orgwrote:

...
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

Gilles Dubuc

5 May 5 May

7:58 a.m.

New subject: [Ops] Caching API responses

It seems like the browser will not always pick up/respect the Cache-Control directive for the browser cache (I don't know why, could be specific to my machine/OS X and I've wasted many hours already trying to figure it out). I've found a workaround, which is using Last-Modified/If-Modified-Since (which will trigger the 304 mechanism) in addition to Cache-Control: https://gerrit.wikimedia.org/r/131425 It's probably worth having that in general anyway, for older browsers.

On Sat, May 3, 2014 at 3:31 PM, Gilles Dubuc gilles@wikimedia.org wrote:

...

I think I might have found a reason for the lack of browser caching of these API calls. Since the expiry isn't "forever", the browser will request the content a second time, looking for a 304 response. When it hits a 304 it will read the body of the response from its cache. Then, after getting a 304, the browser stops hitting the web server altogether for that URL and reads the entire response from its cache.

The issue is that neither Varnish nor PHP on vagrant ever return a 304, it's always a 200 response. As a result, the browser cache is never leveraged for those URLs. I verified this by returning a 304 manually on my vagrant vm.

On Sat, May 3, 2014 at 12:22 PM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
I've just found out that Varnish caching of these API calls works, but not browser caching. Which explains the discrepancy you saw on our graphs that didn't lower as much as the servers did: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/566Peopl... are just hitting Varnish instead of the API servers now.

On Sat, May 3, 2014 at 11:18 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
I vaguely remember some NavTiming/EventLogging work from the Multimedia

...
team, is this correct?

Yes, we've been using the Resource Timing API as well as gathering HTTP headers to determine varnish hits and misses. You can see the global graphs here: http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform... we also have the same graphs on per-wiki dashboards listed here: https://www.mediawiki.org/wiki/Multimedia/Metrics

"imagemiss" is the graph that's the most interesting to you, it tracks varnish misses on thumbnail requests. On the left-hand size of that graph, if you turn off everything except "imagemiss (size)" it shows you that the misses have been declining, while imagehits (varnish hits) have been steady.

Gergo manually rendered a ratio graph a couple of days ago, that shows how much the network effect of all people using Media Viewer has had an impact on the ratio of Varnish misses: http://ur1.ca/h8sa3 https://chart.googleapis.com/chart?cht=lc&chs=600x400&chds=0,1&chxt=x,y&chxr=1,0,100%7C0,1,31,1&chxl=1:Apr+1%7C11%7C21%7CMay+1&chd=t:0.7744107744,0.7146892655,0.6157407407,0.6258234519,0.6097087379,0.6268939394,0.6531007752,0.662027833,0.5877192982,0.6180371353,0.6314102564,0.6047297297,0.5735849057,0.618705036,0.5930232558,0.4857612267,0.3755687784,0.2952091255,0.2792185921,0.3090937403,0.2896433741,0.2661846309,0.2562147829,0.251244208,0.2177919249,0.232136633,0.2255857954,0.2392075695,0.2198016295,0.2327399767,0.2292069632We might make an equivalent permanent graph on our dashboard.

I wonder whether there is something wrong with our logging

...
I don't think that these caching optimizations have been backported:

https://gerrit.wikimedia.org/r/#/c/127459/ https://gerrit.wikimedia.org/r/#/c/127438/

Which means that they've only been deployed to most wikipedias on Thursday.

Maybe it wasn't that visible on the graph yesterday, but userinfo looks like it's dropping: https://www.dropbox.com/s/bq7be6m8i0rlbzh/Screenshot%202014-05-03%2010.15.13...

Also, keep in mind that the Resource Timing data is sampled, server data isn't. The trends are likely to have the same general direction, but slope steepness might not match because of the sampling.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.orgwrote:

...
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

On Fri, May 2, 2014 at 10:04 PM, Gergo Tisza gtisza@wikimedia.orgwrote:

...
On Fri, May 2, 2014 at 2:38 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...

userinfo

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... More spiky, yet quite stable, but my understanding is that Media Viewer is far from being the only consumer of that API call. Not sure how we could differentiate the effect of Media Viewer from the rest of the traffic for this one.

I stupidly named the JS class that gets user information UserInfo, but we are actually using the users API:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13... The big drop is because we don't request it anymore for languages where it won't actually make a difference to the translation. (That and caching.) Curently the only big user is plwiki; the other one will be ruwiki. The largest languages won't use it. (This depends on the translations so it might change at any time without any MediaViewer code/config change, but that is unlikely to happen.)

Confirmed this manually; our client-side stats don't show much difference in the number of users API requests though, I wonder whether there is something wrong with our logging:

http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perform...

-filerepoinfo:

...
https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

This one is the odd bird compared to the other ones, as it's noticeably growing, but the scale shows us that it's called a lot less than the others. The effect of the caching launch on the 24th is counter-intuitive: there are more invocations and they're more spiky afterwards. Might be worth double-checking that caching was done right for that one.

I confirmed manually that filerepoinfo is cached both in Varnish and the user's browser. We might be seeing usage from some other source - since MediaViewer was deployed to frwiki with the normal deploy train, any number of other extensions might have changed their behavior.

Again, our own stats don't show any reduction. The way we differentiate cached and uncached requests might be wrong.

Gergo Tisza

7:17 p.m.

New subject: [Ops] Caching API responses

On Sun, May 4, 2014 at 10:58 PM, Gilles Dubuc gilles@wikimedia.org wrote:

...

It seems like the browser will not always pick up/respect the Cache-Control directive for the browser cache (I don't know why, could be specific to my machine/OS X and I've wasted many hours already trying to figure it out). I've found a workaround, which is using Last-Modified/If-Modified-Since (which will trigger the 304 mechanism) in addition to Cache-Control: https://gerrit.wikimedia.org/r/131425 It's probably worth having that in general anyway, for older browsers.

After some testing and googling, it seems at least Firefox and Chrome ignore max-age when you refresh the page (but not when you navigate via links). Is it possible that you ran into that? This comment in the Chromium tracker has some explanation: https://code.google.com/p/chromium/issues/detail?id=1906#c6 I verified that this affects AJAX requests as well - the API requests are not cached when I press F5, but cached when I click on the "Page" tab (which links to itself) and reopen the same image.

Some of the answers to this SO question have a lot of details about caching behavior across browsers: http://stackoverflow.com/q/385367/323407

Gilles Dubuc

6 May 6 May

9:18 a.m.

New subject: [Ops] Caching API responses

...

After some testing and googling, it seems at least Firefox and Chrome ignore max-age when you refresh the page (but not when you navigate via links). Is it possible that you ran into that?

Indeed, I figured that part out pretty late, those browsers send cache-busting headers for Cache-Control on plain refresh, which is what you'd expect of a shift-refresh, not a plain one. The SO post has the comprehensive table listing the headers sent.

So, caching is working. My changeset has a limited benefit: it will make the browser cache work on plain refreshes as well.

On Mon, May 5, 2014 at 7:17 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Sun, May 4, 2014 at 10:58 PM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
It seems like the browser will not always pick up/respect the Cache-Control directive for the browser cache (I don't know why, could be specific to my machine/OS X and I've wasted many hours already trying to figure it out). I've found a workaround, which is using Last-Modified/If-Modified-Since (which will trigger the 304 mechanism) in addition to Cache-Control: https://gerrit.wikimedia.org/r/131425 It's probably worth having that in general anyway, for older browsers.

After some testing and googling, it seems at least Firefox and Chrome ignore max-age when you refresh the page (but not when you navigate via links). Is it possible that you ran into that? This comment in the Chromium tracker has some explanation: https://code.google.com/p/chromium/issues/detail?id=1906#c6 I verified that this affects AJAX requests as well - the API requests are not cached when I press F5, but cached when I click on the "Page" tab (which links to itself) and reopen the same image.

Some of the answers to this SO question have a lot of details about caching behavior across browsers: http://stackoverflow.com/q/385367/323407

Gilles Dubuc

7 May 7 May

7:01 a.m.

New subject: [Ops] Caching API responses

Faidon/Ops, I've just noticed that all the API graphite graphs I compiled the other day seem to stop at some point on the 3rd:

https://graphite.wikimedia.org/render/?width=586&height=308&_salt=13...

Is this a known issue?

On Tue, May 6, 2014 at 9:18 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

After some testing and googling, it seems at least Firefox and Chrome

...
ignore max-age when you refresh the page (but not when you navigate via links). Is it possible that you ran into that?

Indeed, I figured that part out pretty late, those browsers send cache-busting headers for Cache-Control on plain refresh, which is what you'd expect of a shift-refresh, not a plain one. The SO post has the comprehensive table listing the headers sent.

So, caching is working. My changeset has a limited benefit: it will make the browser cache work on plain refreshes as well.

On Mon, May 5, 2014 at 7:17 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...
On Sun, May 4, 2014 at 10:58 PM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
It seems like the browser will not always pick up/respect the Cache-Control directive for the browser cache (I don't know why, could be specific to my machine/OS X and I've wasted many hours already trying to figure it out). I've found a workaround, which is using Last-Modified/If-Modified-Since (which will trigger the 304 mechanism) in addition to Cache-Control: https://gerrit.wikimedia.org/r/131425 It's probably worth having that in general anyway, for older browsers.

After some testing and googling, it seems at least Firefox and Chrome ignore max-age when you refresh the page (but not when you navigate via links). Is it possible that you ran into that? This comment in the Chromium tracker has some explanation: https://code.google.com/p/chromium/issues/detail?id=1906#c6 I verified that this affects AJAX requests as well - the API requests are not cached when I press F5, but cached when I click on the "Page" tab (which links to itself) and reopen the same image.

Some of the answers to this SO question have a lot of details about caching behavior across browsers: http://stackoverflow.com/q/385367/323407

3862

Age (days ago)

3873

Last active (days ago)

multimedia@lists.wikimedia.org

21 comments

6 participants

tags (0)

participants (6)

Faidon Liambotis
Federico Leva (Nemo)
Gergo Tisza
Gilles Dubuc
Max Semenik
Ori Livneh