Faidon - thanks for the more accurate trackdown, and fix!

On Sunday, May 5, 2013, Faidon Liambotis wrote:
On Fri, May 03, 2013 at 03:19:13PM -0700, Asher Feldman wrote:
> 1) Our multicast purge stream is very busy and isn't split up by cache
> type, so it includes lots of purge requests for images on
> upload.wikimedia.org.  Processing the purges is somewhat cpu intensive, and
> I saw doing so once per varnish server as preferable to twice.

I believe the plan is to split up the multicast groups *and* to filter
based on predefined regexps on the HTCP->PURGE layer, via the
varnishhtcpd rewrite. But I may be mistaken, Mark and Brandon will know

> There are multiple ways to approach making the purges sent to the frontends
> actually work such as rewriting the purges in varnish, rewriting them
> before they're sent to varnish depending on where they're being sent, or
> perhaps changing how cached objects are stored in the frontend.  I
> personally think it's all an unnecessary waste of resources and prefer my
> original approach.

Although the current VCL calls vcl_recv_purge after the rewrite step
(and hence actually rewriting purges too), unless I'm mistaken this is
actually unnecessary. The incoming purges match the way the objects are
stored in the cache: both are without the .m. (et al) prefix, as normal
"desktop" purges are matched with objects that had their URLs rewritten
in vcl_recv. Handling purges after the rewrite step might be unnecessary
but it doesn't mean it's a bad idea though; it doesn't hurt much and
it's better as it allows us to also purge via the original .m. URL,
which is what a person might do instictively.

While mobile purges were actually broken recently in the past in a
similar way as you guessed with I77b88f[1] ("Restrict PURGE lookups to
mobile domains") they were fixed shortly after with I76e5c4[2], a full
day before the frontend cache TTL was removed.

1: https://gerrit.wikimedia.org/r/#q,I77b88f3b4bb5ec84f70b2241cdd5dc496025e6fd,n,z
2: https://gerrit.wikimedia.org/r/#q,I76e5c4218c1dec06673aa5121010875031c1a1e2,n,z

What actually broke them again this time is I3d0280[3], which stripped
absolute URIs before vcl_recv_purge, despite the latter having code that
matches only against absolute URIs. This is my commit, so I'm
responsible for this breakage, although in my defence I have an even
score now for discovering the flaw last time around :)

I've pushed and merged I08f761[4] which moves rewrite_proxy_urls after
vcl_recv_purge and should hopefully unbreak purging while also not
reintroducing BZ #47807.

3: https://gerrit.wikimedia.org/r/#q,I3d02804170f7e502300329740cba9f45437a24fa,n,z
4: https://gerrit.wikimedia.org/r/#q,I08f7615230037a6ffe7d1130a2a6de7ba370faf2,n,z

As a side note, notice how rewrite_proxy_urls & vcl_recv_purge are both
flawed in the same way: the former exists solely to workaround a Varnish
bug with absolute URIs, while the latter is *depending* on that bug to
manifest to actually work. req.url should always be a (relative) URL and
hence the if (req.url ~ '^http:') comparison in vcl_recv_purge should
normally always evaluate to false, making the whole function a no-op.

However, due to the bug in question, Varnish doesn't special-handle
absolute URIs in violation of RFC 2616. This, in combination with the
fact that varnishhtcpd always sends absolute URIs (due to an
RFC-compliant behavior of LWP's proxy() method), is why we have this
seemingly wrong VCL code but which actually works as intended.

This Varnish bug was reported by Tim upstream[5] and the fix is
currently sitting in Varnish's git master[6]. It's simple enough and it
might be worth it to backport it, although it might be more troulbe that
it's worth, considering how it will break purges with our current VCL :)

5: https://www.varnish-cache.org/trac/ticket/1255
6: https://www.varnish-cache.org/trac/changeset/2bbb032bf67871d7d5a43a38104d58f747f2e860


Mobile-l mailing list