I don't know if this is the only source, but one likely source: http://sample.lagotto.io/sources/wikipedia This directly says their default query against 25 wikipedia instances is "DOI" or "URL". Being an open source project this code is likely running in many places.
Found this after max mentioned the api logs might have something. Basically checked for logged api requests with `srsearch="10.` and sorted by the number of times a particular ip address showed up in short(~5min) timespans across a few days. Several AWS ip's, a few ip's that don't have a name in reverse lookup, and sample.lagotto.io.
On Mon, Jul 27, 2015 at 5:08 PM, Trey Jones tjones@wikimedia.org wrote:
The signature is very consistent. I only have to search for ^"10. to find them, and they all look more or less like this:
"10.####/<ID>" OR "http://<publisher_website>/.../10.####/<ID>"
If they are consistently cranking out 45K of these searches every 2 hours or so, they should be easy to find once we have a place to look.
I'm trying to make sense of it. Does it make sense as referral spam or something?
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Mon, Jul 27, 2015 at 6:49 PM, Tomasz Finc tfinc@wikimedia.org wrote:
If the signature is as specific as were seeing here then i'm sure we'll see them again and can easily identify.
--tomasz
On Mon, Jul 27, 2015 at 3:48 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Mon, Jul 27, 2015 at 3:39 PM, Tomasz Finc tfinc@wikimedia.org
wrote:
On Mon, Jul 27, 2015 at 2:04 PM, Trey Jones tjones@wikimedia.org
wrote:
and it's 9% of the wiki zero-results queries
That's a huge discovery to better understand our traffic.
What do we know about who this is? proxy, bot, app, other, etc?
I'm eager to have a talk with them :)
The current firehose of logs doesn't contain any PII, so we basically
have
no idea where these come from. I've been thinking with oliver on
if/what PII
should be stored (the data is under NDA anyways, but we've always err'd
on
the side of caution).
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search