[dropping fr-2012 in bcc in exchange for fr-tech]

Fundraising should be able to make the switch fairly quickly.  It should also eliminate of the the annoying little bugs as well. In actuality, our code shouldn't notice as a tab is whitespace. :-)

Thanks Diederik.


On Fri, Jan 25, 2013 at 9:41 AM, Diederik van Liere <dvanliere@wikimedia.org> wrote:
Apologies for crossposting

Heya,

The Analytics Team is planning to deploy "tab as field delimiter" to replace the current space as fielddelimiter on the varnish/squid/nginx servers. We would like to do this on February 1st. The reason for this change is that we need to have a consistent number of fields in each webrequest log line. Right now, some fields contain spaces and that require a lot of post-processing cleanup and slows down the generation of reports. 

What is affected and maintained by Analytics

* udp-filter already has support for the tab character
* webstatscollector: we compiled a new version of filter to add support for the tab character
* wikistats: we will fix the scripts on an ongoing basis.
* udp2log: we have a patch ready for inserting sequence numbers separated by tab.

In particular, I would like to have feedback to three questions:

1) Are there important reasons not to use tab as field delimiter?

2) Are there important pieces of logging that expect a space instead of a tab and that need to be fixed and that I did not mention in this email?

3) Is February 1st a good date to deploy this change? (Assuming that all preps are finished)


Best,

Diederik



--

Peter Gehres
Wikimedia Foundation
https://donate.wikimedia.org