Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
On Fri, Oct 25, 2013 at 2:07 PM, Steven Walling swalling@wikimedia.orgwrote:
Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
Yes it needs extra vetting!
One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
On Fri, Oct 25, 2013 at 2:16 PM, Diederik van Liere <dvanliere@wikimedia.org
wrote:
One piece of feedback: combining threshold and 'time to threshold' seems
to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
Separating is probably the simplest thing to do, but you could also just remove Sum as an output for time to threshold. The two metrics can make sense together, if you check out a result like:
"Average": { "threshold": 0.1735, "time_to_threshold": 0.7863 }
On Fri, Oct 25, 2013 at 5:43 PM, Steven Walling swalling@wikimedia.orgwrote:
On Fri, Oct 25, 2013 at 2:16 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
One piece of feedback: combining threshold and 'time to threshold' seems
to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
Separating is probably the simplest thing to do, but you could also just remove Sum as an output for time to threshold. The two metrics can make sense together, if you check out a result like:
"Average": { "threshold": 0.1735, "time_to_threshold": 0.7863 }
Hi, so there's a brief story behind this. Stefan caught this problem before we deployed and I made the call to push it out without fixing. In wikimetrics parlance, "threshold" and "time_to_threshold" are submetrics of the Threshold metric. I think the right solution here is to make a map from Aggregations to Submetrics. This would describe which aggregate is allowed for which submetric, and we could display this mapping along with an explanation on a page under /reports
The reason we chose to compute the metrics together is that if you think about it:
threshold = time_to_threshold is not null
So resource wise, you're basically getting it for free.
Steven's point goes back to a suggestion I made a while ago: we need to avoid a many-to-many relation between metrics and aggregators.
Each metric should return just values of one type (e.g. no mixing of booleans and integers, like threshold and time to threshold) and we should specify for each metric : (1) what the expected type of the output is and (2) what aggregators are appropriate for that type.
Practically, we can group metrics into categories depending on the attribute they compute: • binary attributes (e.g. "got reverted", "got blocked", "is productive", "hit threshold") • counts ("bytes added", "pages created", "time to threshold") • rates ("revert rate")
Each of these attributes will have a canonical type: • boolean for binary attributes • integer for counts • float for rates
We can then specify what aggregator is valid as a function of the metric category/type.
How does that sound?
Dario
On Oct 25, 2013, at 2:16 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
On Fri, Oct 25, 2013 at 2:07 PM, Steven Walling swalling@wikimedia.org wrote: Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
Yes it needs extra vetting! One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
Steven Walling, Product Manager https://wikimediafoundation.org/
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
That works for me, I'm just curious what changed your mind (from the definition of #699 we hammered out together). It really is no big deal either way though.
On Friday, October 25, 2013, Dario Taraborelli wrote:
Steven's point goes back to a suggestion I made a while ago: we need to avoid a many-to-many relation between metrics and aggregators.
Each metric should return just values of one type (e.g. no mixing of booleans and integers, like threshold and time to threshold) and we should specify for each metric : (1) what the expected type of the output is and (2) what aggregators are appropriate for that type.
Practically, we can group metrics into *categories* depending on the attribute they compute: • binary attributes (e.g. "got reverted", "got blocked", "is productive", "hit threshold") • counts ("bytes added", "pages created", "time to threshold") • rates ("revert rate")
Each of these attributes will have a canonical *type*: • boolean for binary attributes • integer for counts • float for rates
We can then specify what *aggregator* is valid as a function of the metric category/type.
How does that sound?
Dario
On Oct 25, 2013, at 2:16 PM, Diederik van Liere <dvanliere@wikimedia.org<javascript:_e({}, 'cvml', 'dvanliere@wikimedia.org');>> wrote:
On Fri, Oct 25, 2013 at 2:07 PM, Steven Walling <swalling@wikimedia.org<javascript:_e({}, 'cvml', 'swalling@wikimedia.org');>
wrote:
Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
Yes it needs extra vetting!
One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org <javascript:_e({}, 'cvml', 'Wikimetrics@lists.wikimedia.org');> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org <javascript:_e({}, 'cvml', 'Wikimetrics@lists.wikimedia.org');> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
I did see the benefits of your suggestion of TTT as a submetric but I hadn't thought through the usability implications when it comes to aggregators. As far as I know this is the only metric with a submetric attached to it among those implemented so far, right?
On Oct 25, 2013, at 8:20 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
That works for me, I'm just curious what changed your mind (from the definition of #699 we hammered out together). It really is no big deal either way though.
On Friday, October 25, 2013, Dario Taraborelli wrote: Steven's point goes back to a suggestion I made a while ago: we need to avoid a many-to-many relation between metrics and aggregators.
Each metric should return just values of one type (e.g. no mixing of booleans and integers, like threshold and time to threshold) and we should specify for each metric : (1) what the expected type of the output is and (2) what aggregators are appropriate for that type.
Practically, we can group metrics into categories depending on the attribute they compute: • binary attributes (e.g. "got reverted", "got blocked", "is productive", "hit threshold") • counts ("bytes added", "pages created", "time to threshold") • rates ("revert rate")
Each of these attributes will have a canonical type: • boolean for binary attributes • integer for counts • float for rates
We can then specify what aggregator is valid as a function of the metric category/type.
How does that sound?
Dario
On Oct 25, 2013, at 2:16 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
On Fri, Oct 25, 2013 at 2:07 PM, Steven Walling swalling@wikimedia.org wrote: Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
Yes it needs extra vetting! One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
Steven Walling, Product Manager https://wikimediafoundation.org/
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
I was also thinking that, while both approaches could work for the end user as long as there's a UI, handling aggregators for submetrics will be a pain when we turn Wikimetrics into an API that can be queried via HTTP. The "one metric, one response type, one aggregator" approach should make things much more straightforward.
On Oct 26, 2013, at 8:10 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I did see the benefits of your suggestion of TTT as a submetric but I hadn't thought through the usability implications when it comes to aggregators. As far as I know this is the only metric with a submetric attached to it among those implemented so far, right?
On Oct 25, 2013, at 8:20 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
That works for me, I'm just curious what changed your mind (from the definition of #699 we hammered out together). It really is no big deal either way though.
On Friday, October 25, 2013, Dario Taraborelli wrote: Steven's point goes back to a suggestion I made a while ago: we need to avoid a many-to-many relation between metrics and aggregators.
Each metric should return just values of one type (e.g. no mixing of booleans and integers, like threshold and time to threshold) and we should specify for each metric : (1) what the expected type of the output is and (2) what aggregators are appropriate for that type.
Practically, we can group metrics into categories depending on the attribute they compute: • binary attributes (e.g. "got reverted", "got blocked", "is productive", "hit threshold") • counts ("bytes added", "pages created", "time to threshold") • rates ("revert rate")
Each of these attributes will have a canonical type: • boolean for binary attributes • integer for counts • float for rates
We can then specify what aggregator is valid as a function of the metric category/type.
How does that sound?
Dario
On Oct 25, 2013, at 2:16 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
On Fri, Oct 25, 2013 at 2:07 PM, Steven Walling swalling@wikimedia.org wrote: Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
Yes it needs extra vetting! One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
Steven Walling, Product Manager https://wikimediafoundation.org/
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Got it, that's perfectly reasonable. And like I said, not a big deal to change any of this. Ok, so right now the following metrics have sub-metrics:
bytes-added: net_sum, absolute_sum, positive_only_sum, negative_only_sum survival: survived, censored threshold: threshold, time_to_threshold, censored
I thought about this and figured out an alternative that may make sense. We can keep censored as it's not as much a sub-metric but an informational thing. And we can keep the bytes_added submetrics together because they'll always aggregate the same way. But in the case of threshold, when there are disparate data types returned, we can just return two results (so you'd have two rows on the reports page. This is probably the trickiest way to do it for me, but it seems the cleanest for the user. Thoughts?
Dan
On Sat, Oct 26, 2013 at 11:22 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I was also thinking that, while both approaches could work for the end user as long as there's a UI, handling aggregators for submetrics will be a pain when we turn Wikimetrics into an API that can be queried via HTTP. The "one metric, one response type, one aggregator" approach should make things much more straightforward.
On Oct 26, 2013, at 8:10 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I did see the benefits of your suggestion of TTT as a submetric but I hadn't thought through the usability implications when it comes to aggregators. As far as I know this is the only metric with a submetric attached to it among those implemented so far, right?
On Oct 25, 2013, at 8:20 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
That works for me, I'm just curious what changed your mind (from the definition of #699 we hammered out together). It really is no big deal either way though.
On Friday, October 25, 2013, Dario Taraborelli wrote:
Steven's point goes back to a suggestion I made a while ago: we need to avoid a many-to-many relation between metrics and aggregators.
Each metric should return just values of one type (e.g. no mixing of booleans and integers, like threshold and time to threshold) and we should specify for each metric : (1) what the expected type of the output is and (2) what aggregators are appropriate for that type.
Practically, we can group metrics into *categories* depending on the attribute they compute: • binary attributes (e.g. "got reverted", "got blocked", "is productive", "hit threshold") • counts ("bytes added", "pages created", "time to threshold") • rates ("revert rate")
Each of these attributes will have a canonical *type*: • boolean for binary attributes • integer for counts • float for rates
We can then specify what *aggregator* is valid as a function of the metric category/type.
How does that sound?
Dario
On Oct 25, 2013, at 2:16 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
On Fri, Oct 25, 2013 at 2:07 PM, Steven Walling swalling@wikimedia.orgwrote:
Hey all,
I used the threshold metric for the first time yesterday. First off, thanks for adding it! Dario tells me it was brand new as of yesterday? He also said it needs vetting?
Yes it needs extra vetting!
One piece of feedback: combining threshold and 'time to threshold' seems to make things more confusing. For example, when you select sum as an output, you also get the sum of the time to threshold. That result -- like "time_to_threshold": 92.7864 -- seems to be simply the sum of hours for the members of the cohort. Knowing that it took the cohort a combined 92 hours to reach the threshold isn't very actionable.
So.......what are you proposing? separating it as two separate metrics?
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Wikimetrics mailing list Wikimetrics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics
wikimetrics@lists.wikimedia.org