Hi all,
In the last 24 hours I have found two new cases of spaces in log lines where the space is
not used as a delimiter.
Case 1:
There are mobile page requests that contain a space in the URL, for example:
ssl1002 2198871 2012-04-06T23:50:24.566 0.002 0.0.0.0 FAKE_CACHE_STATUS/301 1051 GET
https://en.m.wikipedia.org/wiki/Extensor_carpi radialis longus NONE/mobilewikipedia -
https://www.biodigitalhuman.com/ -
Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64)%20AppleWebKit/535.19%20(KHTML,%20like%20Gecko)%20Chrome/18.0.1025.151%20Safari/535.19
Case 2:
The mimetype on varnish often contains additional charset=utf8 information, that results
in a mimetype like "application/json; charset=utf8" or "text/xml;
charset=utf8"
Instead of continuing patching our servers to fix these space issues I strongly suggest
that we move away from the space as delimiter and start using the tab (\t) character.
Spaces not being used as delimiters have been cropping up in our server logs for many
years and it makes the analytics part that much more complex as we need to check more and
more edge cases and/or create patches. I rather solve the problem at the root and that is
by moving to a new delimiter.
The delimiter is added by nginx/varnish/squid when writing the log file, so
Please let me know if this is a sane or insane idea. Please let me also know if you are
a consumer of these server log files and you would need to make a change on your end to
accommodate this change.
Andrew has been working hard on building a test environment in Labs where we have nginx /
varnish / squid servers running with production configuration and where we can test these
changes extensively.
Best,
Diederik