Patches item #3603143, was opened at 2013-02-02 16:46
Message generated for change (Comment added) made by yurochek
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=360314…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Ori Livneh (atdt)
Assigned to: Yuri Astrakhan (yurochek)
Summary:
Make the user-agent string more descriptive
Initial Comment:
The current default (and widely-used) user-agent string for Pywikipedia bot is
"PythonWikipediaBot/1.0". This is both inconsistent with the documentation
(which claims it is "Pywikipediabot/1.0") and not as informative as it could be.
This patch changes the default user-agent string format to "Pywikipediabot/1.0
(r<revId>; <scriptName>)", where 'revId' is the SVN revision of
Pywikipedia and 'scriptName' is the tail path component and file name of the
currently executing script. Here is a full example: "Pywikipediabot/1.0 (r11026;
pywikipedia/wikipedia.py)"
The name of the currently executing script could help developers and ops engineers at the
Wikimedia Foundation pinpoint client implementation issues. For example, some
implementations do not efficiently batch requests for multiple titles, but without a more
descriptive user-agent string it is hard to know whom to notify or where to submit a
patch.
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek)
Date:
2013-02-03 23:37
Message:
merged
----------------------------------------------------------------------
Comment By: Ori Livneh (atdt)
Date: 2013-02-03 15:59
Message:
Well, OK. Updated patch.
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek)
Date: 2013-02-03 13:04
Message:
Yes, this is what i what i think would be the best format.
Browsers do it out for historical reason, and the value is inside the
parens - so it is all parts of the first value. The RFC describes each
value to be separated by space, and I think we should follow that, not the
hacks that browsers introduce one on top of the other. The "official" RFC
position is for values to be separated by spaces, with a version followed
the slash.
Example:
Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0
Opera 9.50
----------------------------------------------------------------------
Comment By: Ori Livneh (atdt)
Date: 2013-02-03 00:31
Message:
Is this what you have in mind?
mylib-myscript.py/r1234 Pywikipediabot/1.0
It seems odd to tack the SVN revision ID of Pywikipediabot to
mylib-myscript.py, which would be versioned differently.
Detailed implementation information in parens, with items separated by
semicolon, has been the norm since around Netscape 2.0, and is the current
practice of all major browsers:
http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek)
Date: 2013-02-02 23:08
Message:
Sorry, there shouldn't be a semicolon per spec - only a space. Also, since
the patch includes parent dir, it should probably be replaced with a '-'.
----------------------------------------------------------------------
Comment By: Yuri Astrakhan (yurochek)
Date: 2013-02-02 23:04
Message:
Per
http://tools.ietf.org/html/rfc2616#section-14.43 I think the UA should
be "<scriptname>/<revNumber>; Pywikipediabot/1.0"
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=360314…