Just speaking for the Kafka/Hadoop use case, you'd be perfectly able to grep through without having to hit production-level requests; HDFS files are very deliberately partitioned on the class of source varnish (mobile, text, misc, upload, etc): you can just grep through the misc files.
(Unless you meant a literal grep rather than a figurative one. In which case, ignore this ;p)
On 30 January 2015 at 04:51, Faidon Liambotis faidon@wikimedia.org wrote:
On Tue, Jan 27, 2015 at 01:23:10PM +0100, Christian Aistleitner wrote:
But if you want to make the point that misc need not be logged and misc wasn't intentionally in udp2log and the 5xx tsvs, then by all means: Yes, agreed, let's remove it. From both kafka and udp2log. I am all for it.
I don't think it was intentional, no. Even if it was at the time, I think it'd be wrong to put everything into the same pool of logs/statistics. Production should be separate and we shouldn't have to grep production 5xxs in the same log that also has e.g. git.wm.org's 5xx.
All that said, a (separate) 5xx log of misc services can be useful, so I wouldn't object.
Faidon
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics