Faidon - thanks for the more accurate trackdown, and fix!
On Sunday, May 5, 2013, Faidon Liambotis wrote:
On Fri, May 03, 2013 at 03:19:13PM -0700, Asher Feldman wrote:
- Our multicast purge stream is very busy and isn't split up by cache
type, so it includes lots of purge requests for images on upload.wikimedia.org. Processing the purges is somewhat cpu intensive,
and
I saw doing so once per varnish server as preferable to twice.
I believe the plan is to split up the multicast groups *and* to filter based on predefined regexps on the HTCP->PURGE layer, via the varnishhtcpd rewrite. But I may be mistaken, Mark and Brandon will know more.
There are multiple ways to approach making the purges sent to the
frontends
actually work such as rewriting the purges in varnish, rewriting them before they're sent to varnish depending on where they're being sent, or perhaps changing how cached objects are stored in the frontend. I personally think it's all an unnecessary waste of resources and prefer my original approach.
Although the current VCL calls vcl_recv_purge after the rewrite step (and hence actually rewriting purges too), unless I'm mistaken this is actually unnecessary. The incoming purges match the way the objects are stored in the cache: both are without the .m. (et al) prefix, as normal "desktop" purges are matched with objects that had their URLs rewritten in vcl_recv. Handling purges after the rewrite step might be unnecessary but it doesn't mean it's a bad idea though; it doesn't hurt much and it's better as it allows us to also purge via the original .m. URL, which is what a person might do instictively.
While mobile purges were actually broken recently in the past in a similar way as you guessed with I77b88f[1] ("Restrict PURGE lookups to mobile domains") they were fixed shortly after with I76e5c4[2], a full day before the frontend cache TTL was removed.
1: https://gerrit.wikimedia.org/r/#q,I77b88f3b4bb5ec84f70b2241cdd5dc496025e6fd,... 2: https://gerrit.wikimedia.org/r/#q,I76e5c4218c1dec06673aa5121010875031c1a1e2,...
What actually broke them again this time is I3d0280[3], which stripped absolute URIs before vcl_recv_purge, despite the latter having code that matches only against absolute URIs. This is my commit, so I'm responsible for this breakage, although in my defence I have an even score now for discovering the flaw last time around :)
I've pushed and merged I08f761[4] which moves rewrite_proxy_urls after vcl_recv_purge and should hopefully unbreak purging while also not reintroducing BZ #47807.
3: https://gerrit.wikimedia.org/r/#q,I3d02804170f7e502300329740cba9f45437a24fa,... 4: https://gerrit.wikimedia.org/r/#q,I08f7615230037a6ffe7d1130a2a6de7ba370faf2,...
As a side note, notice how rewrite_proxy_urls & vcl_recv_purge are both flawed in the same way: the former exists solely to workaround a Varnish bug with absolute URIs, while the latter is *depending* on that bug to manifest to actually work. req.url should always be a (relative) URL and hence the if (req.url ~ '^http:') comparison in vcl_recv_purge should normally always evaluate to false, making the whole function a no-op.
However, due to the bug in question, Varnish doesn't special-handle absolute URIs in violation of RFC 2616. This, in combination with the fact that varnishhtcpd always sends absolute URIs (due to an RFC-compliant behavior of LWP's proxy() method), is why we have this seemingly wrong VCL code but which actually works as intended.
This Varnish bug was reported by Tim upstream[5] and the fix is currently sitting in Varnish's git master[6]. It's simple enough and it might be worth it to backport it, although it might be more troulbe that it's worth, considering how it will break purges with our current VCL :)
5: https://www.varnish-cache.org/trac/ticket/1255 6: https://www.varnish-cache.org/trac/changeset/2bbb032bf67871d7d5a43a38104d58f...
Cheers, Faidon
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/mobile-l