The signature is very consistent. I only have to search for ^"10. to find them, and they all look more or less like this:
"10.####/<ID>" OR "http://<publisher_website>/.../10.####/<ID>"
If they are consistently cranking out 45K of these searches every 2 hours or so, they should be easy to find once we have a place to look.
I'm trying to make sense of it. Does it make sense as referral spam or something?
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Mon, Jul 27, 2015 at 6:49 PM, Tomasz Finc tfinc@wikimedia.org wrote:
If the signature is as specific as were seeing here then i'm sure we'll see them again and can easily identify.
--tomasz
On Mon, Jul 27, 2015 at 3:48 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Mon, Jul 27, 2015 at 3:39 PM, Tomasz Finc tfinc@wikimedia.org
wrote:
On Mon, Jul 27, 2015 at 2:04 PM, Trey Jones tjones@wikimedia.org
wrote:
and it's 9% of the wiki zero-results queries
That's a huge discovery to better understand our traffic.
What do we know about who this is? proxy, bot, app, other, etc?
I'm eager to have a talk with them :)
The current firehose of logs doesn't contain any PII, so we basically
have
no idea where these come from. I've been thinking with oliver on if/what
PII
should be stored (the data is under NDA anyways, but we've always err'd
on
the side of caution).
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search