Hello,
I recently set up Varnish for a MediaWiki wiki I do IT for. I noticed that the hit rate was almost zero, though, because we use Google Analytics, which sets cookies that cause the HTTP accelerator to essentially ignore the request and simply pass the request back to Apache. As a result, Varnish effectively can't function, and as while I'm not sure, I think that the same cookies would have a similar effect on a squid cache as well.
The Vector skin causes similar problems because it sets a cookie whenever the user opens or closes the CollapsibleNav elements in the sidebar (the little triangles that let you open and close sections of the sidebar). These collapsible nav elemens are heavily used on our wiki because our sidebar is used very heavily for navigation.
What makes this a little frustrating is that, after having spent several days figuring out how to use Varnish (I'm an economist, after all, not a programmer, and it's a sophisticated program), I get the feeling that my effort is being thwarted by cookies that play a role only on the client side, in JavaScript. Therefore, the cache should be able to completely ignore them.
Varnish uses scripts in a remarkably flexible programming language (called VCL) to determine which files to cache and which to pass to Apache. In particular, it can use PCRE based regexes on http headers to determine how to handle a request. For example, it should be able to look for session, UserID or Token cookies from MediaWiki. If it finds one of these, then it could be configured to pass the request to Apache rather than looking in the cache. However, if those cookies aren't set and the requested resource is in the MediaWiki folder (i.e. the folder in $wgScriptPath), then it could conceivably look for the object in the cache rather than contacting Apache, even if other cookies, such as the Google Analytics or Vector cookies, are set. If the requested resource is in another folder, different behavior could apply.
Does this seem like a reasonable workaround to you? Does anyone have any tips on how to make it work more smoothly? My wiki is quite small and most users are fairly technically unsophisticated, so we might not stumble upon many of Mediawiki's more esoteric cookies (if it has any) very frequently. We're also quite a tight group, so if the above behavior caused any quirks by caused by Varnish occasionally caching something it shouldn't, we might be able to live with that by simply educating users about how to mitigate those quirks by the time they become adventurous enough to stumble on them.
Any help would be gratefully appreciated.
Forest
On 30/12/11 14:50, Forest ForTrees wrote:
Varnish uses scripts in a remarkably flexible programming language (called VCL) to determine which files to cache and which to pass to Apache. In particular, it can use PCRE based regexes on http headers to determine how to handle a request. For example, it should be able to look for session, UserID or Token cookies from MediaWiki. If it finds one of these, then it could be configured to pass the request to Apache rather than looking in the cache. However, if those cookies aren't set and the requested resource is in the MediaWiki folder (i.e. the folder in $wgScriptPath), then it could conceivably look for the object in the cache rather than contacting Apache, even if other cookies, such as the Google Analytics or Vector cookies, are set. If the requested resource is in another folder, dif ferent behavior could apply.
MediaWiki can send a header called X-Vary-Options which gives you a list of header substrings to vary the cache on. It is specific to MediaWiki.
http://www.mail-archive.com/squid-dev@squid-cache.org/msg07066.html
A VCL routine which parses this header and uses it to make caching decisions would solve your problem and, if published, would also be useful to other sites using Varnish with MediaWiki.
-- Tim Starling
On Thu, Dec 29, 2011 at 8:49 PM, Tim Starling tstarling@wikimedia.org wrote:
A VCL routine which parses this header and uses it to make caching decisions would solve your problem and, if published, would also be useful to other sites using Varnish with MediaWiki.
In case it's useful for hacking purposes: It also looks like Wikia's Varnish config is public and there's a presentation from Artur about it.
https://svn.wikia-code.com/utils/varnishhtcpd/ http://www.scribd.com/doc/48960012/Varnish-A-State-of-the-Art-High-Performan...
The VCL file doesn't evaluate X-Vary-Options but it simply unsets cookies for logged out users.
Many thanks for those helpful answers, Tim and Erik. It was pretty exciting to get emails from people so important to the Wikimedia organization.I ended up following Wikia's approach (it's been tested, after all) and am now seeing a 44%-50% decrease in page load speeds, as measured by Firebug.
The recommended code for handling requests with authorization credentials or cookies is to simply pass them to the back-end webserver: if (req.http.Authorization || req.http.Cookie) { /* Not cacheable by default */ return (pass); }
Since Wikia and I both deal with cookies that are used only by the client, we simply unset the cookies before execution reaches the above code (though I do this only if the requested URL is in the Mediawiki folder, "/~ptpn/w/"). This is done through the following VCL code:
if(req.url ~ "ptpn/w/"){
# more typically "^/wiki/" if(req.http.Cookie ~ "(session|UserID|UserName|Token|LoggedOut)") { # dont do anything, the user is logged in } else { # dont care about any other cookies unset req.http.Cookie; } } This seems to work quite well and I'm extremely impressed by Varnish's resource usage.
Once I've had more time to test out and learn my way around the code, I'll try to update the Mediawiki documentation on Varnish to reflect the above changes and the changes to VCL since version 3 of Varnish. I may post follow questions then, in particular about which cookies I need to preserve (I'd like to cut the set down so that I can hit the cache after I log out, something I can't do now because cookies remain).
Forest
From: Erik Moeller erik@wikimedia.org To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, December 30, 2011 12:15 AM Subject: Re: [Mediawiki-l] Squid/Varnish and cookies set by Vector and Google Analytics
On Thu, Dec 29, 2011 at 8:49 PM, Tim Starling tstarling@wikimedia.org wrote:
A VCL routine which parses this header and uses it to make caching decisions would solve your problem and, if published, would also be useful to other sites using Varnish with MediaWiki.
In case it's useful for hacking purposes: It also looks like Wikia's Varnish config is public and there's a presentation from Artur about it.
https://svn.wikia-code.com/utils/varnishhtcpd/ http://www.scribd.com/doc/48960012/Varnish-A-State-of-the-Art-High-Performan...
The VCL file doesn't evaluate X-Vary-Options but it simply unsets cookies for logged out users.
-- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org