Hello all, 1- Don't forget about the bug triage. The blog post just published https://blog.wikimedia.org/2014/07/10/pywikibot-will-have-its-next-bug-triage-on-july-24%e2%88%9227/ so you can read and use it to advertise.
2- As our talk in here https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Pywikibot I want to make a patch to add username in user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Best
I think e-mail address is something personal and sensitive and is out of the reach of CUs and nowhere in Wikipedia is compulsory. Additionally, Pywikibot may be used in various MW installations of which we don't know anything in advance. Thus not a very good idea.
What was the goal of including the username? Is it worth? The price should not be bigger then the advantage.
2014-07-10 20:47 GMT+02:00 Amir Ladsgroup ladsgroup@gmail.com:
Hello all, 1- Don't forget about the bug triage. The blog post just published https://blog.wikimedia.org/2014/07/10/pywikibot-will-have-its-next-bug-triage-on-july-24%e2%88%9227/ so you can read and use it to advertise.
2- As our talk in here https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Pywikibot I want to make a patch to add username in user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Best
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
It's because of user-agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy
I think we can discuss this in a more public place like wikitech-l
Do you agree?
On Thu, Jul 10, 2014 at 11:34 PM, Bináris wikiposta@gmail.com wrote:
I think e-mail address is something personal and sensitive and is out of the reach of CUs and nowhere in Wikipedia is compulsory. Additionally, Pywikibot may be used in various MW installations of which we don't know anything in advance. Thus not a very good idea.
What was the goal of including the username? Is it worth? The price should not be bigger then the advantage.
2014-07-10 20:47 GMT+02:00 Amir Ladsgroup ladsgroup@gmail.com:
Hello all, 1- Don't forget about the bug triage. The blog post just published https://blog.wikimedia.org/2014/07/10/pywikibot-will-have-its-next-bug-triage-on-july-24%e2%88%9227/ so you can read and use it to advertise.
2- As our talk in here https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Pywikibot I want to make a patch to add username in user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Best
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
I don't see anything like a compulsory username in this policy. I think Pywikibot has a UA that complies, it does not have to be unique and personal. I rememberd something like statistical purpose but may have mismatched something.
But I have no problem with going to wikitech-l if you understand the policy in a different way.
2014-07-10 21:19 GMT+02:00 Amir Ladsgroup ladsgroup@gmail.com:
It's because of user-agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy
I think we can discuss this in a more public place like wikitech-l
Do you agree?
On Thu, Jul 10, 2014 at 11:34 PM, Bináris wikiposta@gmail.com wrote:
I think e-mail address is something personal and sensitive and is out of the reach of CUs and nowhere in Wikipedia is compulsory. Additionally, Pywikibot may be used in various MW installations of which we don't know anything in advance. Thus not a very good idea.
What was the goal of including the username? Is it worth? The price should not be bigger then the advantage.
2014-07-10 20:47 GMT+02:00 Amir Ladsgroup ladsgroup@gmail.com:
Hello all, 1- Don't forget about the bug triage. The blog post just published https://blog.wikimedia.org/2014/07/10/pywikibot-will-have-its-next-bug-triage-on-july-24%e2%88%9227/ so you can read and use it to advertise.
2- As our talk in here https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Pywikibot I want to make a patch to add username in user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Best
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On Jul 10, 2014 3:30 PM, "Bináris" wikiposta@gmail.com wrote:
I don't see anything like a compulsory username in this policy. I think
Pywikibot has a UA that complies, it does not have to be unique and personal.
I rememberd something like statistical purpose but may have mismatched
something.
But I have no problem with going to wikitech-l if you understand the
policy in a different way.
No, the whole point is to be unique, not statistics. I haven't read the policy recently but if the policy is unclear then we can change the policy.
-Jeremy
On Thu, Jul 10, 2014 at 12:36 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Jul 10, 2014 3:30 PM, "Bináris" wikiposta@gmail.com wrote:
I don't see anything like a compulsory username in this policy. I think Pywikibot has a UA that complies, it does not have to be unique and personal. I rememberd something like statistical purpose but may have mismatched something.
But I have no problem with going to wikitech-l if you understand the policy in a different way.
No, the whole point is to be unique, not statistics. I haven't read the policy recently but if the policy is unclear then we can change the policy.
It is about being able to contact the bot-runner if the bot is misbehaving or runs into a problem. From https://meta.wikimedia.org/wiki/User-Agent_policy :
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4"
-Frances
Just sent an e-mail to wikitech-l
Best
On Fri, Jul 11, 2014 at 12:40 AM, Frances Hocutt frances.hocutt@gmail.com wrote:
On Thu, Jul 10, 2014 at 12:36 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Jul 10, 2014 3:30 PM, "Bináris" wikiposta@gmail.com wrote:
I don't see anything like a compulsory username in this policy. I think Pywikibot has a UA that complies, it does not have to be unique and personal. I rememberd something like statistical purpose but may have mismatched something.
But I have no problem with going to wikitech-l if you understand the policy in a different way.
No, the whole point is to be unique, not statistics. I haven't read the policy recently but if the policy is unclear then we can change the
policy.
It is about being able to contact the bot-runner if the bot is misbehaving or runs into a problem. From https://meta.wikimedia.org/wiki/User-Agent_policy :
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4"
-Frances
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
If it turns out that a user identification is really an expectation, then urlencoding or base64 may be a good solution for non-ASCII names. However, in this case there is still a point that it is only expected in Foundation's wikis and not neccessary in other MW installations.And once any personal data is not neccessary, it is not desirable and reasonable. So we should then introduce a WMF switch in user-config.py for each account the bot uses and personalize the UA only for those accounts where this switch is on.
2014-07-11 1:09 GMT+02:00 Amir Ladsgroup ladsgroup@gmail.com:
Just sent an e-mail to wikitech-l
Best
On Fri, Jul 11, 2014 at 12:40 AM, Frances Hocutt <frances.hocutt@gmail.com
wrote:
On Thu, Jul 10, 2014 at 12:36 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Jul 10, 2014 3:30 PM, "Bináris" wikiposta@gmail.com wrote:
I don't see anything like a compulsory username in this policy. I think Pywikibot has a UA that complies, it does not have to be unique and personal. I rememberd something like statistical purpose but may have mismatched something.
But I have no problem with going to wikitech-l if you understand the policy in a different way.
No, the whole point is to be unique, not statistics. I haven't read the policy recently but if the policy is unclear then we can change the
policy.
It is about being able to contact the bot-runner if the bot is misbehaving or runs into a problem. From https://meta.wikimedia.org/wiki/User-Agent_policy :
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4"
-Frances
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On Jul 11, 2014 1:08 AM, "Bináris" wikiposta@gmail.com wrote:
If it turns out that a user identification is really an expectation,
Well, not identification exactly. But it should be unique per bot and offer a way to contact the operator. (not just identify the operator; a single operator may operate both misbehaving and compliant bots)
It's beyond a doubt that this is part of the guidelines.
If you don't comply and you're causing problems then you may find your IP blocked from *reading* (or maybe just writing to) the wikis. Imagine if that IP is the toolserver? (I know it's dead, pretend it's 2012 for a second)
If you have a good UA string then roots can block narrowly just the problem and not everyone else on your server and also can initiate contact with the operator to get the bot fixed. Instead of waiting for the operator to find the roots and ask why the bot is blocked.
Also, if the bot is somehow logged out, the bot/operator should still be identified in UA string.
then urlencoding or base64 may be a good solution for non-ASCII names.
However, in this case there is still a point that it is only expected in Foundation's wikis and not neccessary in other MW installations.And once any personal data is not neccessary, it is not desirable and reasonable.
What exactly is the objection to bots using unique UA strings?
So we should then introduce a WMF switch in user-config.py for each
account the bot uses and personalize the UA only for those accounts where this switch is on.
IMO, it should have nothing to do with which wiki you're using. If users want to turn it off for a given wiki, fine. Default to enabled.
-Jeremy
----- Original Nachricht ---- Von: Amir Ladsgroup ladsgroup@gmail.com An: Pywikibot discussion list pywikipedia-l@lists.wikimedia.org Datum: 11.07.2014 01:09 Betreff: Re: [Pywikipedia-l] Two issues
Just sent an e-mail to wikitech-l
Best
On Fri, Jul 11, 2014 at 12:40 AM, Frances Hocutt frances.hocutt@gmail.com wrote:
On Thu, Jul 10, 2014 at 12:36 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Jul 10, 2014 3:30 PM, "Bináris" wikiposta@gmail.com wrote:
I don't see anything like a compulsory username in this policy. I
think
Pywikibot has a UA that complies, it does not have to be unique and personal. I rememberd something like statistical purpose but may have mismatched something.
But I have no problem with going to wikitech-l if you understand the policy in a different way.
No, the whole point is to be unique, not statistics. I haven't read the policy recently but if the policy is unclear then we can change the
policy.
It is about being able to contact the bot-runner if the bot is misbehaving or runs into a problem. From https://meta.wikimedia.org/wiki/User-Agent_policy :
"If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4"
-Frances
imho the first contact for the bot are the developers and pywikibot is well known as contact address. The second contact is the bot account itself. I is common to have bot account and the operator should be reachable via wikimail or contact information on the bot's user page. The bot account is unique. I guess it would be enough to have "pywikibot", script name and bot account in the UA.
Xqt
On Jul 10, 2014 2:48 PM, "Amir Ladsgroup" ladsgroup@gmail.com wrote:
2- As our talk in here I want to make a patch to add username in
user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Why not just percent-encode the username?
As ISO 8859 supports % character, this sounds like a reasonable solution for me. we need to just use urllib2 library to encode it.
Best
On Thu, Jul 10, 2014 at 11:53 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Jul 10, 2014 2:48 PM, "Amir Ladsgroup" ladsgroup@gmail.com wrote:
2- As our talk in here I want to make a patch to add username in
user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Why not just percent-encode the username?
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
If i can vote +1 for that solution.
It looks more universal than this solution with emails as a fallback. 10 lip 2014 21:29 "Amir Ladsgroup" ladsgroup@gmail.com napisał(a):
As ISO 8859 supports % character, this sounds like a reasonable solution for me. we need to just use urllib2 library to encode it.
Best
On Thu, Jul 10, 2014 at 11:53 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Jul 10, 2014 2:48 PM, "Amir Ladsgroup" ladsgroup@gmail.com wrote:
2- As our talk in here I want to make a patch to add username in
user-agent in header of API calls. but only ISO 8859 is being supported and usernames can be anything (utf-8). For usernames that might not compatible, people need to put e-mail address in user-config.py. Is it okay for you? Any comments?
Why not just percent-encode the username?
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Le 10/07/2014 21:29, Amir Ladsgroup a écrit :
As ISO 8859 supports % character, this sounds like a reasonable solution for me. we need to just use urllib2 library to encode it.
That it is a bit hard to read though :-D The whole purpose is for site operators to quickly find out who is behind the bot and work with them to fix it / stop hammering the site. A human readable user-agent with detailed point of contact for the bot operator will dramatically speed up looking up the contact.
On Fri, Jul 11, 2014 at 1:23 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 10/07/2014 21:29, Amir Ladsgroup a écrit :
As ISO 8859 supports % character, this sounds like a reasonable solution for me. we need to just use urllib2 library to encode it.
That it is a bit hard to read though :-D The whole purpose is for site operators to quickly find out who is behind the bot and work with them to fix it / stop hammering the site. A human readable user-agent with detailed point of contact for the bot operator will dramatically speed up looking up the contact.
I disagree, decoding websites can decode username in just a second, and
note that just a very low proportion of bot usernames needs to encoded (i.e. the encoded version is not the same as real one.)
Best
-- Antoine "hashar" Musso
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Re user agents, 2010 was a long time ago from an ops perspective.
The need for a user agent is no doubt as strong as in the 2010 edict from Domas, but I would like clarification from ops regarding 'gold standard' before we build something which may no longer be needed by them.
On Fri, Jul 11, 2014 at 11:32 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
On Fri, Jul 11, 2014 at 1:23 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 10/07/2014 21:29, Amir Ladsgroup a écrit :
As ISO 8859 supports % character, this sounds like a reasonable solution for me. we need to just use urllib2 library to encode it.
That it is a bit hard to read though :-D The whole purpose is for site operators to quickly find out who is behind the bot and work with them to fix it / stop hammering the site. A human readable user-agent with detailed point of contact for the bot operator will dramatically speed up looking up the contact.
I disagree, decoding websites can decode username in just a second, and note that just a very low proportion of bot usernames needs to encoded (i.e. the encoded version is not the same as real one.)
Best
-- Antoine "hashar" Musso
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Le 11/07/2014 15:32, Amir Ladsgroup a écrit :
I disagree, decoding websites can decode username in just a second, and note that just a very low proportion of bot usernames needs to encoded (i.e. the encoded version is not the same as real one.)
Just make an option in pywikibot like config.useragent with a comment explaining it should give some contact informations in case of trouble. That will be fine, no need to give the username which would often be not that helpful anyway.
On Jul 11, 2014 9:33 AM, "Amir Ladsgroup" ladsgroup@gmail.com wrote:
On Fri, Jul 11, 2014 at 1:23 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 10/07/2014 21:29, Amir Ladsgroup a écrit :
As ISO 8859 supports % character, this sounds like a reasonable
solution
for me. we need to just use urllib2 library to encode it.
That it is a bit hard to read though :-D The whole purpose is for site operators to quickly find out who is behind the bot and work with them to fix it / stop hammering the site. A human readable user-agent with detailed point of contact for the bot operator will dramatically speed up looking up the contact.
I disagree, decoding websites can decode username in just a second, and
note that just a very low proportion of bot usernames needs to encoded (i.e. the encoded version is not the same as real one.)
Well, my idea was that you could paste in browser location bar and let it magically decode for you.
e.g. append after https://meta.wikimedia.org/wiki/Special:CentralAuth/
or it might even get decoded automatically if you use it in the path for a nonworking host. (depending on browser) e.g. 127.0.0.1:85/%32
Another option is to make users install a redirect onwiki from latin to canonical.
But the operator knows the best way to contact them, we should let them specify what they want. (but see also caveat about single operator running multiple bots. some may cause problems while others do not)
-Jeremy
The bot account itself is not enough until we get a centralized notice for each userpage wikiwide. The main wiki should be included. hu:user:binbot is enogh to contact, and an urlencoded username can be easily inserted into the URL bar of browser in place of any other userpage by a human operator.
On Mon, Jul 14, 2014 at 6:59 PM, Bináris wikiposta@gmail.com wrote:
The bot account itself is not enough until we get a centralized notice for each userpage wikiwide. The main wiki should be included. hu:user:binbot is enogh to contact, and an urlencoded username can be easily inserted into the URL bar of browser in place of any other userpage by a human operator.
Urlencoded will be 'nice' for many usernames which are not quite ISO 8859
We could use the 'wiki name and oldid' of the userpage as a readable fallback for usernames which would contain 'too many' %s (4?).
Good idea!
2014-07-14 11:37 GMT+02:00 John Mark Vandenberg jayvdb@gmail.com:
On Mon, Jul 14, 2014 at 6:59 PM, Bináris wikiposta@gmail.com wrote:
The bot account itself is not enough until we get a centralized notice
for
each userpage wikiwide. The main wiki should be included. hu:user:binbot
is
enogh to contact, and an urlencoded username can be easily inserted into
the
URL bar of browser in place of any other userpage by a human operator.
Urlencoded will be 'nice' for many usernames which are not quite ISO 8859
We could use the 'wiki name and oldid' of the userpage as a readable fallback for usernames which would contain 'too many' %s (4?).
-- John Vandenberg
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
I made a patch to add any customized user agent and (username would be default) and if the person doesn't want to add any user agent, they just can set it to " "
https://gerrit.wikimedia.org/r/#/c/147381/
I would be happy for any comments regarding this patch Best
On Mon, Jul 14, 2014 at 2:08 PM, Bináris wikiposta@gmail.com wrote:
Good idea!
2014-07-14 11:37 GMT+02:00 John Mark Vandenberg jayvdb@gmail.com:
On Mon, Jul 14, 2014 at 6:59 PM, Bináris wikiposta@gmail.com wrote:
The bot account itself is not enough until we get a centralized notice
for
each userpage wikiwide. The main wiki should be included.
hu:user:binbot is
enogh to contact, and an urlencoded username can be easily inserted
into the
URL bar of browser in place of any other userpage by a human operator.
Urlencoded will be 'nice' for many usernames which are not quite ISO 8859
We could use the 'wiki name and oldid' of the userpage as a readable fallback for usernames which would contain 'too many' %s (4?).
-- John Vandenberg
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On Fri, Jul 18, 2014 at 1:40 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
I made a patch to add any customized user agent and (username would be default) and if the person doesn't want to add any user agent, they just can set it to " "
This was merged a week ago.
There is now a config variable to specify the user-agent.
https://gerrit.wikimedia.org/r/#/c/147381/15/pywikibot/config2.py,cm
Amir helpfully put together some documentation here:
https://www.mediawiki.org/wiki/Manual:Pywikibot/User-agent
By default it includes the bot username in the user-agent *only* when it is connecting to a 'Site' listed in your user-config.py . This is still not what the WMF user agent policy requires, as pywikibot frequently reads data from Sites while logged out, or logged in without an entry in the user-config.py. This was specifically mentioned in the wikitech-l discussion as one of the reasons to put a username in the user-agent ; when reading while logged out, the username is not in the payload.
There is also the chance that pywikibot might write while logged out. Pywikibot page & site modules typically have checks to prevent that from occurring, but they are not mandatory, and bugs may slip into the core framework, and script writers might bypass the page & site modules. We have a changeset pending approval to enforce being logged in at the API layer.
https://gerrit.wikimedia.org/r/#/c/147837/
There is an old enhancement request i found for adding file hashes to the useragent, so I have put some thoughts there.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55016