Chase,
Ori mentioned that you might have plans to replace txstatsd as our StatsD collector for graphite/carbon. Do you have such plans and can you elaborate?
The reason I'm asking is because we currently operate txstatsd with a non statsd complaint message processor which has some unexpected, to me, side effects with respect to counters. On the other hand, we cannot use the compliant processor because it uses a whole bunch of unwanted prefixes.
The side effect to this is that I cant just use a vanilla StatsD client in my code, and having looked at the internals of the non compliant processor I'm worried we're abusing it and thus potentially misinterpreting its results in other places (it uses mark() to store statistics which doesn't do aggregation and keeps it across time points). On the other hand, it's implementation of timer is very friendly in that it gives us percentile breakdowns and such...
So, my current thought is either to write another processor with better counter semantics; or we should replace it entirely with a vanilla statsd implementation (like etsy's -- which we might have to end up patching to get the nice timer behavior.) I suspect that writing another processor and hosting a custom deb locally is probably the "easiest" option.
It's probably worth pointing out that txstatsd is currently unmaintained -- Sidnei left Canonical for Google sometime near the end of last year and hasn't touched the code since.
~Matt Walker Wikimedia Foundation Fundraising Technology Team
Matt,
No one seems to love txstatsd -- for many reasons :) I have been looking to try out a version I used at my last job. It should be fully statsd compliant with extras basically. The effort is seen here: https://gerrit.wikimedia.org/r/#/c/131449/
Can you help me understand what the weird behavior you are seeing with counters is? I'm pretty familiar with statsd types overall so I can tell you if this would solve the issue. This version also comes with some timer niceties which would easy to amend / append to. It would be good to lock down the use case here to make sure things will work as you hope
Thanks!
Chase
On 6/3/14, 12:55 PM, Matthew Walker wrote:
Chase,
Ori mentioned that you might have plans to replace txstatsd as our StatsD collector for graphite/carbon. Do you have such plans and can you elaborate?
The reason I'm asking is because we currently operate txstatsd with a non statsd complaint message processor which has some unexpected, to me, side effects with respect to counters. On the other hand, we cannot use the compliant processor because it uses a whole bunch of unwanted prefixes.
The side effect to this is that I cant just use a vanilla StatsD client in my code, and having looked at the internals of the non compliant processor I'm worried we're abusing it and thus potentially misinterpreting its results in other places (it uses mark() to store statistics which doesn't do aggregation and keeps it across time points). On the other hand, it's implementation of timer is very friendly in that it gives us percentile breakdowns and such...
So, my current thought is either to write another processor with better counter semantics; or we should replace it entirely with a vanilla statsd implementation (like etsy's -- which we might have to end up patching to get the nice timer behavior.) I suspect that writing another processor and hosting a custom deb locally is probably the "easiest" option.
It's probably worth pointing out that txstatsd is currently unmaintained -- Sidnei left Canonical for Google sometime near the end of last year and hasn't touched the code since.
~Matt Walker Wikimedia Foundation Fundraising Technology Team
Can you help me understand what the weird behavior you are seeing with counters is? I'm pretty familiar with statsd types overall so I can tell you if this would solve the issue.
Simply, metrics like "ocg.pdftest_counter:1|c" do not count, instead it keeps the last sent value and persists that. It's behaving like a gauge. The work around is to use the meter metric type. Which provides the 'counts / period' stats I'm actually looking for as well as the absolute count.
This version also comes with some timer niceties which would easy to amend
/ append to. It would be good to lock down the use case here to make sure things will work as you hope.
I'm happy with the current timer implementation gives us (though arguably it's actually acting as the histogram type). Essentially, I'm going to be looking for the mean time, the stddev, and some sort of top range 95%/99% information.
Plugging for additional functionality -- it would also be ridiculously cool is if we had the ability to count uniques (sets in the etsy statsd implementation).
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Tue, Jun 3, 2014 at 11:22 AM, Matthew Walker mwalker@wikimedia.org wrote:
Can you help me understand what the weird behavior you are seeing with
counters is? I'm pretty familiar with statsd types overall so I can tell you if this would solve the issue.
Simply, metrics like "ocg.pdftest_counter:1|c" do not count, instead it keeps the last sent value and persists that. It's behaving like a gauge. The work around is to use the meter metric type. Which provides the 'counts / period' stats I'm actually looking for as well as the absolute count.
This version also comes with some timer niceties which would easy to amend
/ append to. It would be good to lock down the use case here to make sure things will work as you hope.
I'm happy with the current timer implementation gives us (though arguably it's actually acting as the histogram type). Essentially, I'm going to be looking for the mean time, the stddev, and some sort of top range 95%/99% information.
So counters and sets exist (and work) in the linked (proposed) deb. Out of the box timers I think are lower/count (of all the timers matching that key)/mean/upper/upper_99.
Chase
On 6/3/14, 1:34 PM, Matthew Walker wrote:
Plugging for additional functionality -- it would also be ridiculously cool is if we had the ability to count uniques (sets in the etsy statsd implementation).
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Tue, Jun 3, 2014 at 11:22 AM, Matthew Walker <mwalker@wikimedia.org mailto:mwalker@wikimedia.org> wrote:
Can you help me understand what the weird behavior you are seeing with counters is? I'm pretty familiar with statsd types overall so I can tell you if this would solve the issue. Simply, metrics like "ocg.pdftest_counter:1|c" do not count, instead it keeps the last sent value and persists that. It's behaving like a gauge. The work around is to use the meter metric type. Which provides the 'counts / period' stats I'm actually looking for as well as the absolute count. This version also comes with some timer niceties which would easy to amend / append to. It would be good to lock down the use case here to make sure things will work as you hope. I'm happy with the current timer implementation gives us (though arguably it's actually acting as the histogram type). Essentially, I'm going to be looking for the mean time, the stddev, and some sort of top range 95%/99% information.
Amusingly; I just forked Steve Ivy's NodeJS client so that I could use with txstatsd. I'd be happy to help test the install in labs if you want -- otherwise I'm content to watch this unfold from the sidelines.
Do you have an approximate eta on when we'll see this in production?
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Tue, Jun 3, 2014 at 12:19 PM, Chase Pettet cpettet@wikimedia.org wrote:
So counters and sets exist (and work) in the linked (proposed) deb. Out of the box timers I think are lower/count (of all the timers matching that key)/mean/upper/upper_99.
Chase
On 6/3/14, 1:34 PM, Matthew Walker wrote:
Plugging for additional functionality -- it would also be ridiculously cool is if we had the ability to count uniques (sets in the etsy statsd implementation).
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Tue, Jun 3, 2014 at 11:22 AM, Matthew Walker mwalker@wikimedia.org wrote:
Can you help me understand what the weird behavior you are seeing with
counters is? I'm pretty familiar with statsd types overall so I can tell you if this would solve the issue.
Simply, metrics like "ocg.pdftest_counter:1|c" do not count, instead it keeps the last sent value and persists that. It's behaving like a gauge. The work around is to use the meter metric type. Which provides the 'counts / period' stats I'm actually looking for as well as the absolute count.
This version also comes with some timer niceties which would easy to
amend / append to. It would be good to lock down the use case here to make sure things will work as you hope.
I'm happy with the current timer implementation gives us (though arguably it's actually acting as the histogram type). Essentially, I'm going to be looking for the mean time, the stddev, and some sort of top range 95%/99% information.