This is awesome. Roughly, by eye, it looks like automata are about 2% of ZRR overall and 5% of ZRR for fulltext search, which was around 15% before the holidays (and lower over the holidays—during The Time of Unreliable User Behavior).

Is there a write up for this project? I know it had to be a ton of work, and I'm curious about the details (possibly more so than most).

Do you think you got most of them? Or was the result high-precision but not exhaustive?

Thanks for working on this!

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation


On Mon, Jan 4, 2016 at 1:29 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Hey all,

After several weeks of work to switch all the scripts over and
backfill, all the Discovery dashboards now have the ability to filter
crawlers and automated software out from graphs where that is
relevant. You should notice a simple checkbox on, for example, the
Zero Results Rate data or Wikidata Query Service traffic.

While a bit of backfilling is still waiting on the servers syncing up,
this work is essentially complete, and provides another way to look at
data on how people are using search (and who those people are). It was
a heck of a lot of work, by both myself and Mikhail, but it's
hopefully valuable :).

For Discovery Analytics,

--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery