Hi all!
Just an FYI here that this has been done, yay! Varnish, Nginx, and Squid frontends are now all logging with tab as the field delimiter.
For those who would notice, for the time being, we have started outputting logs to new filenames with .tab. in the name, so as to differentiate the format. We will most likely change the file names back to their original names in a month or so.
Thanks all! -Andrew Otto
On Jan 28, 2013, at 11:33 AM, Matthew Flaschen mflaschen@wikimedia.org wrote:
On 01/27/2013 08:07 AM, Erik Zachte wrote:
The code to change existing tabs into some less obnoxious character is dead trivial, hardly any overhead. At worst one field will then be affected, not the whole record, which makes it easier to spot and debug the anomaly when it happens.
Scanning an input record for tabs and raising a counter is also very efficient. Sending one alert hourly based on this counter should make us aware soon enough when this issue needs follow-up, yet without causing bottle necks.
Doing both of those would be pretty robust. However, if that isn't workable, a simple option is just to strip tab characters before Varnish/Squid/etc. writes the line.
That means downstream code doesn't have to do anything special, and it shouldn't affect many actual requests.
Matt Flaschen
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics