Hi guys!
So, we've had a Todo on our list for a while now to make a couple of tweaks to the web
access log format coming from squid, varnish and nginx.
1. Append Accept-Language and X-Carrier headers.
This brings the field count from 14 up to 16. udp-filter has already been modified to
handle this. I've already got a change in for this:
https://gerrit.wikimedia.org/r/#/c/12188/
2. Change field separator from space to tab.
User-Agent and Content-Type headers (and possibly others) sometimes contain spaces. Some
sources (e.g. varnish) properly URL encode the fields before they are sent out, but others
don't. Using tab as the field separator in web access logs will avoid many of these
issues.
We have wanted to do this for a while, but haven't because we were worried about
breaking Erik Zachte's wikistats scripts. Stefan Petrea is now working with Diederik
on wikistats (and other things), and has dealt with this issue. So! We are ready!
We'd like to make this change before we start real consumption of the web access logs
into the Kraken cluster, which hopefully will be relatively soon.
Would these changes cause Fundraising any foreseeable problems? Can we go ahead and work
with ops to push this through?
Thanks!
-Andrew Otto