On Thu, Jan 28, 2016 at 11:47 AM, Nuria Ruiz nuria@wikimedia.org wrote:
A user-agent should be client specific, not server specific.
This makes a lot of sense. If I build a bot that crawls wikipedia and facebook public pages it really doesn't make sense that my bot has a "wikimediaBot" user agent, just the word "Bot" should probably be enough.
Anything with "bot" (case-insensitive) in the UA is already caught by the "spiderPattern" regex [0]. The rest of the new logic in Webrequest related to this feature seems to take that into account by making isSpider() check that wikimediaBotPattern is not matched. I guess a bigger question is why try to differentiate between "spiders" and "bots" at all?
[0]: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-...
Bryan