Hey all,
So, the data for the Search dashboards (http://searchdata.wmflabs.org/metrics/) comes from a variety of sources, one of which is the daily logs of all Cirrus search requests - about 46GB of data a day. We set up a pipeline to this to report the "zero" rate - how many queries happen with zero results. This was a pretty shaky pipeline because it was an ultra-urgent, need-it-for-a-presentation thing.
Good news: my prediction that it needed work was accurate. Bad news: my prediction that it needed work was accurate ;).
When Erik and I went through all of the scripts and rewrote them on the 15th we discovered a lot of maintenance tasks that were being identified as searches. These are now being excluded, but we have to backfill 1.5 months of data. I've chosen to eliminate the old data and then backfill, because it means we avoid having data from multiple, dissonant software versions, and because it just makes the backfilling task a bit easier.
As a result, the dashboards may look a bit odd over the next couple of days; they have data from the 15th onwards that we're comfortable about, but are gradually backfilling from 1 June to 14 July - starting on 1 June. So at the moment we have 1 June and 15-21 July. Weird. And then 1, 2nd June, 15th...so on.
So expect to see increasingly less weird graphs, until the point where they're back to normal, (but more consistent and sane looking). Until then: yeah, they're gonna look a bit weird.
Thanks,
Does this mean all of the data that was previously in the dashboards from April and May will now be permanently gone from the dashboards?
Dan
On 22 July 2015 at 14:13, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all,
So, the data for the Search dashboards (http://searchdata.wmflabs.org/metrics/) comes from a variety of sources, one of which is the daily logs of all Cirrus search requests
- about 46GB of data a day. We set up a pipeline to this to report the
"zero" rate - how many queries happen with zero results. This was a pretty shaky pipeline because it was an ultra-urgent, need-it-for-a-presentation thing.
Good news: my prediction that it needed work was accurate. Bad news: my prediction that it needed work was accurate ;).
When Erik and I went through all of the scripts and rewrote them on the 15th we discovered a lot of maintenance tasks that were being identified as searches. These are now being excluded, but we have to backfill 1.5 months of data. I've chosen to eliminate the old data and then backfill, because it means we avoid having data from multiple, dissonant software versions, and because it just makes the backfilling task a bit easier.
As a result, the dashboards may look a bit odd over the next couple of days; they have data from the 15th onwards that we're comfortable about, but are gradually backfilling from 1 June to 14 July - starting on 1 June. So at the moment we have 1 June and 15-21 July. Weird. And then 1, 2nd June, 15th...so on.
So expect to see increasingly less weird graphs, until the point where they're back to normal, (but more consistent and sane looking). Until then: yeah, they're gonna look a bit weird.
Thanks,
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
No, it means that the Cirrus data used to build up the failure rate will be /temporarily/ (read; for 2 days) gone, and by the end of that period, replaced with the correct numbers :). The EL-based numbers are perfectly fine.
On 22 July 2015 at 18:13, Dan Garry dgarry@wikimedia.org wrote:
Does this mean all of the data that was previously in the dashboards from April and May will now be permanently gone from the dashboards?
Dan
On 22 July 2015 at 14:13, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all,
So, the data for the Search dashboards (http://searchdata.wmflabs.org/metrics/) comes from a variety of
sources, one of which is the daily logs of all Cirrus search requests
- about 46GB of data a day. We set up a pipeline to this to report the
"zero" rate - how many queries happen with zero results. This was a pretty shaky pipeline because it was an ultra-urgent, need-it-for-a-presentation thing.
Good news: my prediction that it needed work was accurate. Bad news: my prediction that it needed work was accurate ;).
When Erik and I went through all of the scripts and rewrote them on the 15th we discovered a lot of maintenance tasks that were being identified as searches. These are now being excluded, but we have to backfill 1.5 months of data. I've chosen to eliminate the old data and then backfill, because it means we avoid having data from multiple, dissonant software versions, and because it just makes the backfilling task a bit easier.
As a result, the dashboards may look a bit odd over the next couple of days; they have data from the 15th onwards that we're comfortable about, but are gradually backfilling from 1 June to 14 July - starting on 1 June. So at the moment we have 1 June and 15-21 July. Weird. And then 1, 2nd June, 15th...so on.
So expect to see increasingly less weird graphs, until the point where they're back to normal, (but more consistent and sane looking). Until then: yeah, they're gonna look a bit weird.
Thanks,
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
wikimedia-search@lists.wikimedia.org