Brandon, I had a "monday morning quarterback" moment (don't worry, it's not too bad)
The key we chose is "WMF-Last-Access" and it seems to me that's using a lot of unnecessary network bandwidth with its verbosity. We could come up with something shorter (I cc-ed Analytics in case anyone has an opinion) and save our network. My proposal: simply "last"
For those unfamiliar, we're talking about this change: https://gerrit.wikimedia.org/r/#/c/196009/14/templates/varnish/last-access.i... to this header: https://wikitech.wikimedia.org/wiki/X-Analytics
I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)?
On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mforns@wikimedia.org wrote:
+1 'last' _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
+1 for ISO dates. They're also more parsable by researchers.
On 27 April 2015 at 18:57, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)?
On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mforns@wikimedia.org wrote:
+1 'last' _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Gonna stop this ISO date fancy bandwagon right here :)
We could do it with a bunch of VCL code but that affects performance of the site and we'd rather take the hit in analytics. We could look into making a UDF that deals with this and other common date code we'd want to DRY.
On Mon, Apr 27, 2015 at 4:02 PM, Oliver Keyes okeyes@wikimedia.org wrote:
+1 for ISO dates. They're also more parsable by researchers.
On 27 April 2015 at 18:57, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I also noticed the cookie stores a string with a 3-letter month
(27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)?
On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mforns@wikimedia.org
wrote:
+1 'last' _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Varnish doesn't use ISO dates by default? :(
On 27 April 2015 at 19:50, Dan Andreescu dandreescu@wikimedia.org wrote:
Gonna stop this ISO date fancy bandwagon right here :)
We could do it with a bunch of VCL code but that affects performance of the site and we'd rather take the hit in analytics. We could look into making a UDF that deals with this and other common date code we'd want to DRY.
On Mon, Apr 27, 2015 at 4:02 PM, Oliver Keyes okeyes@wikimedia.org wrote:
+1 for ISO dates. They're also more parsable by researchers.
On 27 April 2015 at 18:57, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)?
On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mforns@wikimedia.org wrote:
+1 'last' _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: the header name:
Keep in mind this header does get sent back to clients as a response header as well. It's probably not a great practice to be using a generic-sounding header name like "Last" there and waiting to find out what it conflicts with now or in the future (although I'll note that the only thing even close in current common use is "Last-Modified"). The norm would be at least X-Something for a custom weird header, but I feel the WMF-Something prefix is pretty valid in this context as well.
We could blend up the best of all of these concerns if we don't care about human readability much and just make it something like "X-WMFLA", too. Regardless, to be honest our pages are so inefficient and huge on average that I'm not overly concerned about the size of this one header's name to begin with. There's a ton of lower-hanging fruit than that to go after for reducing response sizes...
Re: date formats:
The stringification VCL offers of the current timestamp "now" was chosen to match the format used in Expires fields of Cookie headers, which is rfc822/rfc1123 -based (with 4 digit years and fixed GMT timezone for best compat with various cookie "standards"), so we get something like "Wed, 01 Jan 2000 01:01:01 GMT". The form we're using for Last-Access is just the easiest transformation we could do on that while throwing out the redundancy and excess precision and making it compatible with cookie data-value formatting.
In defense of minimizing our processing complexity here: In general it's important that we minimize not only the runtime perf hit of our frontline VCL (as *every* single HTTP request to us runs through this code, including DDoS attacks and celebrity death traffic inrushes and such), but also the complexity of the VCL codebase (as it's both operations-critical and horribly difficult to work on), so we tend to prefer to offload any post-processing that's not strictly necessary to elsewhere down the stats pipeline.
-- Brandon
Thanks Brandon, that works for me. The cookie has been great btw, we're learning great stuff.
On Wednesday, April 29, 2015, Brandon Black bblack@wikimedia.org wrote:
Re: the header name:
Keep in mind this header does get sent back to clients as a response header as well. It's probably not a great practice to be using a generic-sounding header name like "Last" there and waiting to find out what it conflicts with now or in the future (although I'll note that the only thing even close in current common use is "Last-Modified"). The norm would be at least X-Something for a custom weird header, but I feel the WMF-Something prefix is pretty valid in this context as well.
We could blend up the best of all of these concerns if we don't care about human readability much and just make it something like "X-WMFLA", too. Regardless, to be honest our pages are so inefficient and huge on average that I'm not overly concerned about the size of this one header's name to begin with. There's a ton of lower-hanging fruit than that to go after for reducing response sizes...
Re: date formats:
The stringification VCL offers of the current timestamp "now" was chosen to match the format used in Expires fields of Cookie headers, which is rfc822/rfc1123 -based (with 4 digit years and fixed GMT timezone for best compat with various cookie "standards"), so we get something like "Wed, 01 Jan 2000 01:01:01 GMT". The form we're using for Last-Access is just the easiest transformation we could do on that while throwing out the redundancy and excess precision and making it compatible with cookie data-value formatting.
In defense of minimizing our processing complexity here: In general it's important that we minimize not only the runtime perf hit of our frontline VCL (as *every* single HTTP request to us runs through this code, including DDoS attacks and celebrity death traffic inrushes and such), but also the complexity of the VCL codebase (as it's both operations-critical and horribly difficult to work on), so we tend to prefer to offload any post-processing that's not strictly necessary to elsewhere down the stats pipeline.
-- Brandon
Analytics mailing list Analytics@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/analytics
Yes - thanks Brandon for the detailed explanation. Not urgent but I'd love to see a list of the low hanging fruit for where our pages are inefficient.
On Apr 29, 2015, at 07:55, Dan Andreescu dandreescu@wikimedia.org wrote:
Thanks Brandon, that works for me. The cookie has been great btw, we're learning great stuff.
On Wednesday, April 29, 2015, Brandon Black bblack@wikimedia.org wrote: Re: the header name:
Keep in mind this header does get sent back to clients as a response header as well. It's probably not a great practice to be using a generic-sounding header name like "Last" there and waiting to find out what it conflicts with now or in the future (although I'll note that the only thing even close in current common use is "Last-Modified"). The norm would be at least X-Something for a custom weird header, but I feel the WMF-Something prefix is pretty valid in this context as well.
We could blend up the best of all of these concerns if we don't care about human readability much and just make it something like "X-WMFLA", too. Regardless, to be honest our pages are so inefficient and huge on average that I'm not overly concerned about the size of this one header's name to begin with. There's a ton of lower-hanging fruit than that to go after for reducing response sizes...
Re: date formats:
The stringification VCL offers of the current timestamp "now" was chosen to match the format used in Expires fields of Cookie headers, which is rfc822/rfc1123 -based (with 4 digit years and fixed GMT timezone for best compat with various cookie "standards"), so we get something like "Wed, 01 Jan 2000 01:01:01 GMT". The form we're using for Last-Access is just the easiest transformation we could do on that while throwing out the redundancy and excess precision and making it compatible with cookie data-value formatting.
In defense of minimizing our processing complexity here: In general it's important that we minimize not only the runtime perf hit of our frontline VCL (as *every* single HTTP request to us runs through this code, including DDoS attacks and celebrity death traffic inrushes and such), but also the complexity of the VCL codebase (as it's both operations-critical and horribly difficult to work on), so we tend to prefer to offload any post-processing that's not strictly necessary to elsewhere down the stats pipeline.
-- Brandon
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics