I recently upgraded my wikis from 1.24.2 to 1.26.2 and I'm trying to diagnose why my Varnish hit ratios (4 web servers) dropped from ~85% to ~37%. Nothing in my Varnish configuration or VCL changed. For what it's worth, I allocate 4 GB to Varnish. I'm fairly sure it has to do with image/thumbnail caching, since there's about 30 GB worth of thumbnails across all five wikis, though 12 GB of that is for our biggest and most heavily trafficked wiki. (FWIW, I did change vcl_recv() last year to strip cookies from thumbnail images and that improved overall performance while maintaining a high hit ratio.) The Varnish n_lru_nuked stat is growing rapidly as well, so it's definitely using all of the memory and having to make room for new stuff.
Any thoughts on what might be causing this degradation, or at least how to diagnose it?
Just thought I'd follow up on my original question, though I've not resolved it. I did increase the varnish memory from 4 to 8 GB but that doesn't seem to have had any effect as the hit ratio still tops out around 37%. I did also start tracking Varnish's LRU nuked objects and once it did start nuking things, it's averaging about 100-200 per minute. Apart from the MW and extension upgrades, along with dist-upgrading (Ubuntu 12.04) the servers at the same time to bring everything up-to-date, nothing else about the content or architecture changed. Any ideas on how to diagnose this cache performance issue? I've been running varnishlog in all sorts of ways trying to see any patterns in what requests are resulting in cache hits vs. misses but no real luck there, either.
On Wed, Mar 23, 2016 at 11:19 AM, Justin Lloyd jlloyd.wiki@gmail.com wrote:
I recently upgraded my wikis from 1.24.2 to 1.26.2 and I'm trying to diagnose why my Varnish hit ratios (4 web servers) dropped from ~85% to ~37%. Nothing in my Varnish configuration or VCL changed. For what it's worth, I allocate 4 GB to Varnish. I'm fairly sure it has to do with image/thumbnail caching, since there's about 30 GB worth of thumbnails across all five wikis, though 12 GB of that is for our biggest and most heavily trafficked wiki. (FWIW, I did change vcl_recv() last year to strip cookies from thumbnail images and that improved overall performance while maintaining a high hit ratio.) The Varnish n_lru_nuked stat is growing rapidly as well, so it's definitely using all of the memory and having to make room for new stuff.
Any thoughts on what might be causing this degradation, or at least how to diagnose it?
Hi all,
Just wanted to follow up with the "solution". It turns out that PURGEs are being counted against HITs and MISSes now, so that was skewing the data, and our hit ratio is actually fine, about 83-87%. I was recommended to calculate the hit ratio using the numbers of requests coming into Varnish and that it was sending to the backend, i.e. using varnishstat parameters in collectd, hit-ratio = (client_req - backend_req) / client_req. This matched up with our pre-1.26 upgrade hit ratio calculations that just used the cache hits and misses values.
Justin
On Mon, Mar 28, 2016 at 9:21 AM, Justin Lloyd jlloyd.wiki@gmail.com wrote:
Just thought I'd follow up on my original question, though I've not resolved it. I did increase the varnish memory from 4 to 8 GB but that doesn't seem to have had any effect as the hit ratio still tops out around 37%. I did also start tracking Varnish's LRU nuked objects and once it did start nuking things, it's averaging about 100-200 per minute. Apart from the MW and extension upgrades, along with dist-upgrading (Ubuntu 12.04) the servers at the same time to bring everything up-to-date, nothing else about the content or architecture changed. Any ideas on how to diagnose this cache performance issue? I've been running varnishlog in all sorts of ways trying to see any patterns in what requests are resulting in cache hits vs. misses but no real luck there, either.
On Wed, Mar 23, 2016 at 11:19 AM, Justin Lloyd jlloyd.wiki@gmail.com wrote:
I recently upgraded my wikis from 1.24.2 to 1.26.2 and I'm trying to diagnose why my Varnish hit ratios (4 web servers) dropped from ~85% to ~37%. Nothing in my Varnish configuration or VCL changed. For what it's worth, I allocate 4 GB to Varnish. I'm fairly sure it has to do with image/thumbnail caching, since there's about 30 GB worth of thumbnails across all five wikis, though 12 GB of that is for our biggest and most heavily trafficked wiki. (FWIW, I did change vcl_recv() last year to strip cookies from thumbnail images and that improved overall performance while maintaining a high hit ratio.) The Varnish n_lru_nuked stat is growing rapidly as well, so it's definitely using all of the memory and having to make room for new stuff.
Any thoughts on what might be causing this degradation, or at least how to diagnose it?
mediawiki-l@lists.wikimedia.org