Okay, I have a slightly better sample this morning. (I accidentally left out Wikipedias with abbreviations longer than 2 letters).
My new sample: 500K zero-result full_text queries (web and API) across the Wikipedias with 100K+ articles 383,433 unique search strings (that's a long, long tail) The sample covers a little over an hour: 2015-07-23 07:51:29 to 2015-07-23 08:55:42 The top 10 (en, de, pt, ja, ru, es, it, fr, zh, nl), account for >83% of queries
Top 10 counts, for reference: 221618 enwiki 51936 dewiki 25500 ptwiki 24206 jawiki 21891 ruwiki 19913 eswiki 18303 itwiki 14443 frwiki 11730 zhwiki 7685 nlwiki ----- 417225
The DOI searches that appear to come from Lagotto installations hit 25 wikis (as the Lagotto docs said they would), with en getting a lot more, and ru getting fewer in this sample, and the rest *very* evenly distributed. (I missed ceb and war before—apologies). The total is just over 50K queries, or >10% of the full text queries against larger wikis that result in zero results.
===DOI 6050 enwiki 1904 nlwiki 1902 cebwiki 1901 warwiki 1900 viwiki 1900 svwiki 1900 jawiki 1899 frwiki 1899 eswiki 1899 dewiki 1898 zhwiki 1898 ukwiki 1898 plwiki 1898 itwiki 1897 ptwiki 1897 nowiki 1897 fiwiki 1896 huwiki 1896 fawiki 1896 cswiki 1896 cawiki 1895 kowiki 1895 idwiki 1895 arwiki 475 ruwiki ----- 50181
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Mon, Jul 27, 2015 at 5:04 PM, Trey Jones tjones@wikimedia.org wrote:
My original sample was a 100K sample from zero-results queries to enwiki on 7/24. Today I looked at similar samples from 7/10 and 7/17 (since there is a weekly pattern to traffic) and from 7/22 to compare.
All of the patterns I detected are still present, in approximately the same volume (give or take a factor of 2), except for the ('"<TITLE>"', '<AUTHOR(S)>') pattern.
I've started looking at a 500K sample from 7/24 across all wikis. I'll have more results tomorrow, but right now it's already clear that someone is spamming useless DOI searches across wikis—and it's 9% of the wiki zero-results queries.
—Trey