The other way around, I think; only bots work Sundays. We know a lot of the search queries that don't work /shouldn't/ work: they're producing no results because they're nonsense, or spam, or someone being silly through the API. Normal human traffic rises on a Monday to peak on a Tuesday, and begins to drop down again towards the end of the week and weekend. What this means is that the proportion of traffic coming from non-humans is greater on the weekends (because fewer people are browsing) and that increases the impact of automata on the zero results rate for those days.
On 4 January 2016 at 23:28, billinghurst billinghurstwiki@gmail.com wrote:
What is with issue that we have a weekly cycle (exactly?) where there is a 4% difference in the success in half a week, EVERY WEEK!
With the number of searches done on the site, that seems like an aberration that a each Sunday is a more accurate search day!?! Analytical gremlins of data capture, or not even bots work Sundays?
On Tue, 5 Jan 2016 06:54 Oliver Keyes okeyes@wikimedia.org wrote:
(Links: the dashboards live at http://discovery.wmflabs.org/ and an example of automata filtering can be seen at http://discovery.wmflabs.org/metrics/#failure_rate !)
That is, 2% and 5% lower? You're looking at percentages so where the lines vary between checkbox options it'll be different proportions. Unless there's a graph I'm missing :D
On 4 January 2016 at 13:45, Trey Jones tjones@wikimedia.org wrote:
This is awesome. Roughly, by eye, it looks like automata are about 2% of ZRR overall and 5% of ZRR for fulltext search, which was around 15% before the holidays (and lower over the holidays—during The Time of Unreliable User Behavior).
Is there a write up for this project? I know it had to be a ton of work, and I'm curious about the details (possibly more so than most).
Do you think you got most of them? Or was the result high-precision but not exhaustive?
Thanks for working on this!
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Mon, Jan 4, 2016 at 1:29 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all,
After several weeks of work to switch all the scripts over and backfill, all the Discovery dashboards now have the ability to filter crawlers and automated software out from graphs where that is relevant. You should notice a simple checkbox on, for example, the Zero Results Rate data or Wikidata Query Service traffic.
While a bit of backfilling is still waiting on the servers syncing up, this work is essentially complete, and provides another way to look at data on how people are using search (and who those people are). It was a heck of a lot of work, by both myself and Mikhail, but it's hopefully valuable :).
For Discovery Analytics,
-- Oliver Keyes Count Logula Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
-- Oliver Keyes Count Logula Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery