Okay, I have a slightly better sample this morning. (I accidentally left out Wikipedias with abbreviations longer than 2 letters).

My new sample:
500K zero-result full_text queries (web and API) across the Wikipedias with 100K+ articles
383,433 unique search strings (that's a long, long tail)
The sample covers a little over an hour: 2015-07-23 07:51:29 to 2015-07-23 08:55:42
The top 10 (en, de, pt, ja, ru, es, it, fr, zh, nl), account for >83% of queries

Top 10 counts, for reference:
 221618  enwiki
  51936  dewiki
  25500  ptwiki
  24206  jawiki
  21891  ruwiki
  19913  eswiki
  18303  itwiki
  14443  frwiki
  11730  zhwiki
   7685  nlwiki
-----
417225

The DOI searches that appear to come from Lagotto installations hit 25 wikis (as the Lagotto docs said they would), with en getting a lot more, and ru getting fewer in this sample, and the rest very evenly distributed. (I missed ceb and war before—apologies). The total is just over 50K queries, or >10% of the full text queries against larger wikis that result in zero results.

===DOI
   6050 enwiki
   1904 nlwiki
   1902 cebwiki
   1901 warwiki
   1900 viwiki
   1900 svwiki
   1900 jawiki
   1899 frwiki
   1899 eswiki
   1899 dewiki
   1898 zhwiki
   1898 ukwiki
   1898 plwiki
   1898 itwiki
   1897 ptwiki
   1897 nowiki
   1897 fiwiki
   1896 huwiki
   1896 fawiki
   1896 cswiki
   1896 cawiki
   1895 kowiki
   1895 idwiki
   1895 arwiki
    475 ruwiki
-----
50181

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation


On Mon, Jul 27, 2015 at 5:04 PM, Trey Jones <tjones@wikimedia.org> wrote:
My original sample was a 100K sample from zero-results queries to enwiki on 7/24. Today I looked at similar samples from 7/10 and 7/17 (since there is a weekly pattern to traffic) and from 7/22 to compare.

All of the patterns I detected are still present, in approximately the same volume (give or take a factor of 2), except for the ('"<TITLE>"', '<AUTHOR(S)>') pattern.

I've started looking at a 500K sample from 7/24 across all wikis. I'll have more results tomorrow, but right now it's already clear that someone is spamming useless DOI searches across wikis—and it's 9% of the wiki zero-results queries.

—Trey