The signature is very consistent. I only have to search for ^\"10\. to find
them, and they all look more or less like this:
"10.####/<ID>" OR
"http://<publisher_website>/.../10.####/<ID>"
If they are consistently cranking out 45K of these searches every 2 hours
or so, they should be easy to find once we have a place to look.
I'm trying to make sense of it. Does it make sense as referral spam or
something?
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Mon, Jul 27, 2015 at 6:49 PM, Tomasz Finc <tfinc(a)wikimedia.org> wrote:
If the signature is as specific as were seeing here
then i'm sure
we'll see them again and can easily identify.
--tomasz
On Mon, Jul 27, 2015 at 3:48 PM, Erik Bernhardson
<ebernhardson(a)wikimedia.org> wrote:
On Mon, Jul 27, 2015 at 3:39 PM, Tomasz Finc
<tfinc(a)wikimedia.org>
wrote:
>
> On Mon, Jul 27, 2015 at 2:04 PM, Trey Jones <tjones(a)wikimedia.org>
wrote:
and it's 9% of the wiki zero-results queries
That's a huge discovery to better understand our traffic.
What do we know about who this is? proxy, bot, app, other, etc?
I'm eager to have a talk with them :)
The current firehose of logs doesn't contain any PII, so we basically
have
no idea where these come from. I've been
thinking with oliver on if/what
PII
should be stored (the data is under NDA anyways,
but we've always err'd
on
the side of caution).
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search