So, trying to join everyone's points of view, what about?
1. Using the existing
and
modify it to encourage adding the word "bot" (case-insensitive) to the
User-Agent string, so that it can be easily used to identify bots in the
anlytics cluster (no regexps). And link that page from whatever other pages
we think necessary.
2. Do some advertising and outreach and get some bot maintainers and
maybe some frameworks to implement the User-Agent policy. This would make
the existing policy less useless.
Thanks all for the feedback!
On Mon, Feb 1, 2016 at 3:16 PM, Marcel Ruiz Forns <mforns(a)wikimedia.org>
wrote:
Clearly Wikipedia et al. uses bot to refer to
automated software that
edits the site but it seems like you are using
the term bot to refer to all
automated software and it might be good to clarify.
OK, in the documentation we can make that clear. And looking into that,
I've seen that some bots, in the process of doing their "editing" work can
also generate pageviews. So we should also include them as potential source
of pageview traffic. Maybe we can reuse the existing User-Agent policy.
This makes a lot of sense. If I build a bot that crawls wikipedia and
facebook public pages it really doesn't make
sense that my bot has a
"wikimediaBot" user agent, just the word "Bot" should probably be
enough.
Totally agree.
I guess a bigger question is why try to differentiate between "spiders"
and "bots" at all?
I don't think we need to differentiate between "spiders" and
"bots". The
most important question we want to respond is: how much of the traffic we
consider "human" today is actually "bot". So, +1 "bot"
(case-insensitive).
On Fri, Jan 29, 2016 at 9:16 PM, John Mark Vandenberg <jayvdb(a)gmail.com>
wrote:
On 28 Jan 2016 11:28 pm, "Marcel Ruiz
Forns" <mforns(a)wikimedia.org>
wrote:
>
> Why not just "Bot", or "MediaWikiBot" which at least encompasses
all
sites that the client
can
communicate with.
I personally agree with you, "MediaWikiBot" seems to have better
semantics.
For clients accessing the MediaWiki api, it is redundant.
All it does is identify bots that comply with this edict from analytics.
--
John Vandenberg
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation