On Thu, Jan 28, 2016 at 11:47 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
A user-agent
should be client specific, not server specific.
This makes a lot of sense. If I
build a bot that crawls wikipedia and
facebook public pages it really doesn't make sense that my bot has a
"wikimediaBot" user agent, just the word "Bot" should probably be
enough.
Anything with "bot" (case-insensitive) in the UA is already caught by
the "spiderPattern" regex [0]. The rest of the new logic in Webrequest
related to this feature seems to take that into account by making
isSpider() check that wikimediaBotPattern is not matched. I guess a
bigger question is why try to differentiate between "spiders" and
"bots" at all?
[0]:
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery…
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855