Hi all,
When I wrote the new API client library standard, one of the intended effects was that libraries would make it easy for bot-runners to comply with the user-agent policy found at https://meta.wikimedia.org/wiki/User-agent_policy . However, different people understand the policy to mean different things.
As I read it, the relevant parts of the policy are:
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: `User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4`"
There's been some discussion in the context of my client library evaluation project here: https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Java_Wiki_Bo... and here: https://github.com/jpatokal/mediawiki-gateway/issues/65 . As I understood it, the example provided demonstrated the requirements, but it's now clear to me that there's room for ambiguity in the interpretation of the user-agent policy.
My question is: what information is essential to "identify" a bot? The example given appears to contain the bot name and version, a link to a page with more information and/or the repository for the bot's code, and the framework that was used to write the bot (SuperLib/1.4, I assume). Does WMF operations want all of these components? What is the minimum necessary to comply with the policy, and what is bonus information?
-Frances
Have you seen the recent thread at http://lists.wikimedia.org/pipermail/wikitech-l/2014-July/077517.html?
On Wed, Jul 30, 2014 at 6:26 PM, Frances Hocutt frances.hocutt@gmail.com wrote:
Hi all,
When I wrote the new API client library standard, one of the intended effects was that libraries would make it easy for bot-runners to comply with the user-agent policy found at https://meta.wikimedia.org/wiki/User-agent_policy . However, different people understand the policy to mean different things.
As I read it, the relevant parts of the policy are:
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: `User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4`"
There's been some discussion in the context of my client library evaluation project here:
https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Java_Wiki_Bo... and here: https://github.com/jpatokal/mediawiki-gateway/issues/65 . As I understood it, the example provided demonstrated the requirements, but it's now clear to me that there's room for ambiguity in the interpretation of the user-agent policy.
My question is: what information is essential to "identify" a bot? The example given appears to contain the bot name and version, a link to a page with more information and/or the repository for the bot's code, and the framework that was used to write the bot (SuperLib/1.4, I assume). Does WMF operations want all of these components? What is the minimum necessary to comply with the policy, and what is bonus information?
-Frances
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
I have. What I took from that thread was "the more information you provide, the more likely it is that ops will be able to contact you if your bot is causing problems" and "there should definitely be some way to contact the person running the bot, whether via talk page for a logged-in bot or via email or similar." Is that accurate?
If so, there's still ambiguity in the "more information is probably better" and it would still be useful to know how much is enough.
Frances
On Wed, Jul 30, 2014 at 4:46 PM, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
Have you seen the recent thread at http://lists.wikimedia.org/pipermail/wikitech-l/2014-July/077517.html?
On Wed, Jul 30, 2014 at 6:26 PM, Frances Hocutt frances.hocutt@gmail.com wrote:
Hi all,
When I wrote the new API client library standard, one of the intended effects was that libraries would make it easy for bot-runners to comply with the user-agent policy found at https://meta.wikimedia.org/wiki/User-agent_policy . However, different people understand the policy to mean different things.
As I read it, the relevant parts of the policy are:
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: `User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4`"
There's been some discussion in the context of my client library evaluation project here:
https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Java_Wiki_Bo... and here: https://github.com/jpatokal/mediawiki-gateway/issues/65 . As I understood it, the example provided demonstrated the requirements, but it's now clear to me that there's room for ambiguity in the interpretation of the user-agent policy.
My question is: what information is essential to "identify" a bot? The example given appears to contain the bot name and version, a link to a page with more information and/or the repository for the bot's code, and the framework that was used to write the bot (SuperLib/1.4, I assume). Does WMF operations want all of these components? What is the minimum necessary to comply with the policy, and what is bonus information?
-Frances
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
On Wed, Jul 30, 2014 at 8:46 PM, Frances Hocutt frances.hocutt@gmail.com wrote:
I have. What I took from that thread was "the more information you provide, the more likely it is that ops will be able to contact you if your bot is causing problems" and "there should definitely be some way to contact the person running the bot, whether via talk page for a logged-in bot or via email or similar." Is that accurate?
Yes, that is accurate.
If so, there's still ambiguity in the "more information is probably better" and it would still be useful to know how much is enough.
I'd say at the least you'd want: * An identifier that isn't going to be confused with many other bots. ** No spoofing browser agents! ** No generic agents such as "curl", "lwp", "Python-urllib", and so on. ** For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator. * Some way to identify how to contact the operator, without relying on other headers in the request (e.g. the login cookies). This could be a reference to a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, an email address, etc.
Just notifying you that we are working on customized user-agent: https://www.mediawiki.org/wiki/Manual:Pywikibot/User-agent
Is it good? specially the default
Best
On 7/31/14, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Wed, Jul 30, 2014 at 8:46 PM, Frances Hocutt frances.hocutt@gmail.com wrote:
I have. What I took from that thread was "the more information you provide, the more likely it is that ops will be able to contact you if your bot is causing problems" and "there should definitely be some way to contact the person running the bot, whether via talk page for a logged-in bot or via email or similar." Is that accurate?
Yes, that is accurate.
If so, there's still ambiguity in the "more information is probably better" and it would still be useful to know how much is enough.
I'd say at the least you'd want:
- An identifier that isn't going to be confused with many other bots.
** No spoofing browser agents! ** No generic agents such as "curl", "lwp", "Python-urllib", and so on. ** For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.
- Some way to identify how to contact the operator, without relying on
other headers in the request (e.g. the login cookies). This could be a reference to a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, an email address, etc.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
On Thu, Jul 31, 2014 at 11:04 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Just notifying you that we are working on customized user-agent: https://www.mediawiki.org/wiki/Manual:Pywikibot/User-agent
Is it good? specially the default
Looks good to me.
Thanks, Brad. That is helpful.
-Frances
On Thu, Jul 31, 2014 at 7:45 AM, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Wed, Jul 30, 2014 at 8:46 PM, Frances Hocutt frances.hocutt@gmail.com wrote:
I have. What I took from that thread was "the more information you provide, the more likely it is that ops will be able to contact you if your bot is causing problems" and "there should definitely be some way to contact the person running the bot, whether via talk page for a logged-in bot or via email or similar." Is that accurate?
Yes, that is accurate.
If so, there's still ambiguity in the "more information is probably better" and it would still be useful to know how much is enough.
I'd say at the least you'd want:
- An identifier that isn't going to be confused with many other bots.
** No spoofing browser agents! ** No generic agents such as "curl", "lwp", "Python-urllib", and so on. ** For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.
- Some way to identify how to contact the operator, without relying on other
headers in the request (e.g. the login cookies). This could be a reference to a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, an email address, etc.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
btw: It seems that blocking does not work correctly, because the following curl returns no error message $ curl -sA "lwp" " https://en.wikipedia.org/w/api.php?action=query&prop=revisions&meta=... "
(Do you have a curl cmdline, where an error message will be returned?)
See: https://meta.wikimedia.org/wiki/User-Agent_policy
User agents that send a User-Agent header that is blacklisted (for
example, any User-Agent string that begins with "lwp", whether it is informative or not) may encounter a less helpful error message (lie) like this:
Our servers are currently experiencing a technical problem. This is
probably temporary and should be fixed soon. Please try again in a few minutes.
-- loki
On Sat, Aug 2, 2014 at 3:37 AM, Frances Hocutt frances.hocutt@gmail.com wrote:
Thanks, Brad. That is helpful.
-Frances
On Thu, Jul 31, 2014 at 7:45 AM, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Wed, Jul 30, 2014 at 8:46 PM, Frances Hocutt <
frances.hocutt@gmail.com>
wrote:
I have. What I took from that thread was "the more information you provide, the more likely it is that ops will be able to contact you if your bot is causing problems" and "there should definitely be some way to contact the person running the bot, whether via talk page for a logged-in bot or via email or similar." Is that accurate?
Yes, that is accurate.
If so, there's still ambiguity in the "more information is probably better" and it would still be useful to know how much is enough.
I'd say at the least you'd want:
- An identifier that isn't going to be confused with many other bots.
** No spoofing browser agents! ** No generic agents such as "curl", "lwp", "Python-urllib", and so on. ** For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is
opaque
to anyone besides the operator.
- Some way to identify how to contact the operator, without relying on
other
headers in the request (e.g. the login cookies). This could be a
reference
to a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, an email address, etc.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
mediawiki-api@lists.wikimedia.org