On Tue, Feb 2, 2016 at 12:40 PM, Marcel Ruiz Forns mforns@wikimedia.org wrote:
Hi all,
It seems comments are decreasing at this point. I'd like to slowly drive this thread to a conclusion.
- Create a plan to block clients that dont implement the (amended)
User-Agent policy.
I think we can decide on this later. Steps 1) and 2) can be done first - they should be done anyway before 3) - and then we can see how much benefit we raise from them. If we don't get a satisfactory reaction from bot/framework maintainers, we then can go for 3). John, would you be OK with that?
If no-one else raises concerns about this, the Analytics team will:
Add a mention to https://meta.wikimedia.org/wiki/User-Agent_policy, to encourage including the word "bot" (case-insensitive) in the User-Agent string, so that bots can be easily identified.
Advertise the convention and reach out to bot/framework maintainers to increase the share of bots that implement the User-Agent policy.
The proposed plan sounds good to me.
I'm very strongly against the suggestion of blocking anyone's access to api.php or the wikis in general over not having "bot" in the user-agent string however. Getting cleaner analytics is a nice goal but the point of the projects is to collect and disseminate information. You might get blocked for doing something deliberately harmful to the services or the community, but getting blocked for not following an arbitrary convention that causes no real harm is extreme. You will quickly find yourself in a strange conundrum as well. To block you will need to establish intent of the User Agent and if you can do that then you probably don't need the "bot" tagging convention in the first place.
Bryan