I don't know if this is the only source, but one likely source: http://sample.lagotto.io/sources/wikipedia
This directly says their default query against 25 wikipedia instances is "DOI" or "URL". Being an open source project this code is likely running in many places.

Found this after max mentioned the api logs might have something. Basically checked for logged api requests with `srsearch="10.` and sorted by the number of times a particular ip address showed up in short(~5min) timespans across a few days.  Several AWS ip's, a few ip's that don't have a name in reverse lookup, and sample.lagotto.io.



On Mon, Jul 27, 2015 at 5:08 PM, Trey Jones <tjones@wikimedia.org> wrote:
The signature is very consistent. I only have to search for ^\"10\. to find them, and they all look more or less like this:

"10.####/<ID>" OR "http://<publisher_website>/.../10.####/<ID>"

If they are consistently cranking out 45K of these searches every 2 hours or so, they should be easy to find once we have a place to look.

I'm trying to make sense of it. Does it make sense as referral spam or something?

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation


On Mon, Jul 27, 2015 at 6:49 PM, Tomasz Finc <tfinc@wikimedia.org> wrote:
If the signature is as specific as were seeing here then i'm sure
we'll see them again and can easily identify.

--tomasz

On Mon, Jul 27, 2015 at 3:48 PM, Erik Bernhardson
<ebernhardson@wikimedia.org> wrote:
> On Mon, Jul 27, 2015 at 3:39 PM, Tomasz Finc <tfinc@wikimedia.org> wrote:
>>
>> On Mon, Jul 27, 2015 at 2:04 PM, Trey Jones <tjones@wikimedia.org> wrote:
>> > and it's 9% of the wiki zero-results queries
>>
>> That's a huge discovery to better understand our traffic.
>>
>> What do we know about who this is? proxy, bot, app, other, etc?
>>
>> I'm eager to have a talk with them :)
>>
>
> The current firehose of logs doesn't contain any PII, so we basically have
> no idea where these come from. I've been thinking with oliver on if/what PII
> should be stored (the data is under NDA anyways, but we've always err'd on
> the side of caution).
>
>
> _______________________________________________
> Wikimedia-search mailing list
> Wikimedia-search@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>

_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search


_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search