Hi all,
In the last 24 hours I have found two new cases of spaces in log lines where the space is not used as a delimiter.
Case 1: There are mobile page requests that contain a space in the URL, for example:
ssl1002 2198871 2012-04-06T23:50:24.566 0.002 0.0.0.0 FAKE_CACHE_STATUS/301 1051 GET https://en.m.wikipedia.org/wiki/Extensor_carpi radialis longus NONE/mobilewikipedia - https://www.biodigitalhuman.com/ - Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64)%20AppleWebKit/535.19%20(KHTML,%20like%20Gecko)%20Chrome/18.0.1025.151%20Safari/535.19
Case 2: The mimetype on varnish often contains additional charset=utf8 information, that results in a mimetype like "application/json; charset=utf8" or "text/xml; charset=utf8"
Instead of continuing patching our servers to fix these space issues I strongly suggest that we move away from the space as delimiter and start using the tab (\t) character. Spaces not being used as delimiters have been cropping up in our server logs for many years and it makes the analytics part that much more complex as we need to check more and more edge cases and/or create patches. I rather solve the problem at the root and that is by moving to a new delimiter.
The delimiter is added by nginx/varnish/squid when writing the log file, so
Please let me know if this is a sane or insane idea. Please let me also know if you are a consumer of these server log files and you would need to make a change on your end to accommodate this change.
Andrew has been working hard on building a test environment in Labs where we have nginx / varnish / squid servers running with production configuration and where we can test these changes extensively.
Best,
Diederik