This is awesome. Roughly, by eye, it looks like automata are about 2% of ZRR overall and 5% of ZRR for fulltext search, which was around 15% before the holidays (and lower over the holidays—during The Time of Unreliable User Behavior).
Is there a write up for this project? I know it had to be a ton of work, and I'm curious about the details (possibly more so than most).
Do you think you got most of them? Or was the result high-precision but not exhaustive?
Thanks for working on this!
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Mon, Jan 4, 2016 at 1:29 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all,
After several weeks of work to switch all the scripts over and backfill, all the Discovery dashboards now have the ability to filter crawlers and automated software out from graphs where that is relevant. You should notice a simple checkbox on, for example, the Zero Results Rate data or Wikidata Query Service traffic.
While a bit of backfilling is still waiting on the servers syncing up, this work is essentially complete, and provides another way to look at data on how people are using search (and who those people are). It was a heck of a lot of work, by both myself and Mikhail, but it's hopefully valuable :).
For Discovery Analytics,
-- Oliver Keyes Count Logula Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery