Patches item #3603143, was opened at 2013-02-02 16:46 Message generated for change (Settings changed) made by yurochek You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3603143...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed
Resolution: Fixed
Priority: 5 Private: No Submitted By: Ori Livneh (atdt) Assigned to: Yuri Astrakhan (yurochek) Summary: Make the user-agent string more descriptive
Initial Comment: The current default (and widely-used) user-agent string for Pywikipedia bot is "PythonWikipediaBot/1.0". This is both inconsistent with the documentation (which claims it is "Pywikipediabot/1.0") and not as informative as it could be. This patch changes the default user-agent string format to "Pywikipediabot/1.0 (r<revId>; <scriptName>)", where 'revId' is the SVN revision of Pywikipedia and 'scriptName' is the tail path component and file name of the currently executing script. Here is a full example: "Pywikipediabot/1.0 (r11026; pywikipedia/wikipedia.py)"
The name of the currently executing script could help developers and ops engineers at the Wikimedia Foundation pinpoint client implementation issues. For example, some implementations do not efficiently batch requests for multiple titles, but without a more descriptive user-agent string it is hard to know whom to notify or where to submit a patch.
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek) Date: 2013-02-03 23:37
Message: merged
----------------------------------------------------------------------
Comment By: Ori Livneh (atdt) Date: 2013-02-03 15:59
Message: Well, OK. Updated patch.
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek) Date: 2013-02-03 13:04
Message: Yes, this is what i what i think would be the best format.
Browsers do it out for historical reason, and the value is inside the parens - so it is all parts of the first value. The RFC describes each value to be separated by space, and I think we should follow that, not the hacks that browsers introduce one on top of the other. The "official" RFC position is for values to be separated by spaces, with a version followed the slash.
Example: Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50
----------------------------------------------------------------------
Comment By: Ori Livneh (atdt) Date: 2013-02-03 00:31
Message: Is this what you have in mind?
mylib-myscript.py/r1234 Pywikipediabot/1.0
It seems odd to tack the SVN revision ID of Pywikipediabot to mylib-myscript.py, which would be versioned differently.
Detailed implementation information in parens, with items separated by semicolon, has been the norm since around Netscape 2.0, and is the current practice of all major browsers:
http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek) Date: 2013-02-02 23:08
Message: Sorry, there shouldn't be a semicolon per spec - only a space. Also, since the patch includes parent dir, it should probably be replaced with a '-'.
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek) Date: 2013-02-02 23:04
Message: Per http://tools.ietf.org/html/rfc2616#section-14.43 I think the UA should be "<scriptname>/<revNumber>; Pywikipediabot/1.0"
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3603143...
pywikipedia-bugs@lists.wikimedia.org