zhuyifei1999 created this task. zhuyifei1999 added subscribers: zhuyifei1999, XZise. zhuyifei1999 added projects: pywikibot-core, Pywikibot-General. Restricted Application added subscribers: Aklapper, pywikipedia-bugs.
TASK DESCRIPTION On an upload of "File:Scanwings, OH-BCX, Beechcraft C90 King Air (17512831479).jpg" to Commons:
Using HEAD, upload fail with a loophole of
``` ERROR: Traceback (most recent call last): File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1555, in submit body=body, headers=headers) File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 950, in wrapper return obj(*__args, **__kw) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 269, in request r = fetch(baseuri, method, body, headers, **kwargs) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 370, in fetch error_handling_callback(request) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 287, in error_handling_callback raise request.data UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128)
WARNING: Waiting 5 seconds before retrying. ```
After debugging, I found all versions after 1e54a7d6886d56a21101900025038e25bab5ad03 has the same error, while version before raises an (expected) exception that is handled in botscript:
``` KeyError: u'bad-prefix' ``` (raised in https://github.com/wikimedia/mediawiki/blob/84899f991ff44f36b7359f0bbf8de295...)
I agree that some errors should be prevented from uploading. But why is there a UnicodeDecodeError leading to a loophole?
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
XZise added a comment.
Unfortunately HEAD is not very specific. The two most current versions (I've checked yet) have different line numbers. What does `python pwb.py version` tell?
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
XZise added a comment.
Okay what command did you use to get the exception? And is it possible to run it in Python 3 maybe? The traceback itself isn't really helpful but afaics at some point in the HTTP layer there was the UnicodeDecodeError and I don't think it's related to the MediaWiki server itself. The problem is that I can't see where the error originally came from and at least usually does Python 3 provide you all exceptions that lead to an exception so this might help.
Python 3.4.3 (default, Mar 10 2015, 13:05:12) [GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux Type "help", "copyright", "credits" or "license" for more information.
try:
... int('a') ... except Exception as e: ... ex = e ...
raise ex
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in <module> ValueError: invalid literal for int() with base 10: 'a'
Python 2.7.8 (default, Aug 28 2014, 16:47:22) [GCC 4.8.3 20140624 (Red Hat 4.8.3-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
try:
... int('a') ... except Exception as e: ... ex = e ...
raise ex
Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: 'a'
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 added a comment.
The command is site.upload(imagePage, source_filename=path, comment=self.summary, ignore_warnings=ignore) where imagePage is a pywikibot.Page, path is an ascii string, self.summary is an ascii string ignore is False
A KeyboardInterrupt during the wait shows:
WARNING: Waiting 40 seconds before retrying. ^CTraceback (most recent call last): File "flickrAZ.py", line 194, in <module> main() File "flickrAZ.py", line 190, in main bot.run() File "flickrAZ.py", line 178, in run self.parsePages() File "flickrAZ.py", line 72, in parsePages self.parsePhoto(photo) File "flickrAZ.py", line 133, in parsePhoto self.upload(imagePage, fileURL, name, False) File "flickrAZ.py", line 153, in upload site.upload(imagePage, source_filename=path, comment=self.summary, ignore_warnings=ignore) File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 950, in wrapper return obj(*__args, **__kw) File "/data/project/yifeibot/pywikibot/pywikibot/site.py", line 4969, in upload result = req.submit() File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1579, in submit self.wait() File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1710, in wait time.sleep(self.retry_wait) KeyboardInterrupt
As for Python 3, it gives "ImportError: No module named 'urllib2'". I'm not sure if I can run it
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 closed this task as "Resolved". zhuyifei1999 claimed this task. zhuyifei1999 added a comment.
Never mind, seems fixed in the current HEAD (not the one a few hours ago) Probably fixed in https://phabricator.wikimedia.org/rPWBCd798da349087d231d9dad203cd6b48a1aea03...
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
XZise added a comment.
Uh … great that Python 3 doesn't work :( Do you mean with “ASCII string” that `the_string.encode('ascii')` does not fail? Also your other traceback isn't very helpful either as there is no HTTP layer present (it's in another thread) :( .
And I don't think https://phabricator.wikimedia.org/rPWBCd798da349087d231d9dad203cd6b48a1aea03... fixed your issue (I at least couldn't explain why).
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, XZise Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 reopened this task as "Open". zhuyifei1999 added a comment.
Sorry, I'm wrong again. Running after "git checkout https://phabricator.wikimedia.org/rPWBCd798da349087d231d9dad203cd6b48a1aea03..." gets the error again.
As for "ASCII string" I meant all characters are valid ascii characters, and the_string.encode('ascii') does not fail, and there is no u flag before the string.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 added a comment.
Ok I added a traceback.print_exc() to https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/comms/thre... and got this:
Traceback (most recent call last): File "/data/project/yifeibot/pywikibot/pywikibot/comms/threadedhttp.py", line 221, in request max_redirects, connection_type File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1569, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1316, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1252, in _conn_request conn.request(method, request_uri, body, headers) File "/usr/lib/python2.7/httplib.py", line 973, in request self._send_request(method, url, body, headers) File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request self.endheaders(body) File "/usr/lib/python2.7/httplib.py", line 969, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 827, in _send_output msg += message_body UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128) ERROR: Traceback (most recent call last): File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1556, in submit body=body, headers=headers) File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 1094, in wrapper return obj(*__args, **__kw) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 279, in request r = fetch(baseuri, method, body, headers, **kwargs) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 381, in fetch error_handling_callback(request) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 297, in error_handling_callback raise request.data UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128)
WARNING: Waiting 5 seconds before retrying.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
valhallasw added a subscriber: valhallasw. valhallasw added a comment.
@zhuyifei1999, please don't remove comments. Clarifying later that the issue wasn't resolved is clear enough.
Could you try adding
import httplib2 httplib2.debuglevel = 4
before the `site.upload()` call? That should make httplib2 output some more detailed information.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, valhallasw Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 added a comment.
@valhallasw do you mean there's a log file or something? I find no change to stdout or stderr with that code added.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
valhallasw added a comment.
It should log to stdout, yes. For example:
import httplib2 httplib2.debuglevel = 4 import pywikibot pywikibot.Page(pywikibot.Site(), 'test')
connect: (en.wikipedia.org, 443) send: u'GET /w/api.php?maxlag=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg HTTP/1.1\r\nHost: en.wikipedia.org\r\nconnection: keep-alive\r\ncontent-type: application/x-www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: (wikipedia:en; User:Valhallasw) Pywikibot/2.0b3 (g5524) httplib2/0.9 Python/2.7.6.final.0\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' ... etc.
but it's possible the exception occurs earlier.
Can you provide the exact values you pass to
site.upload(imagePage, source_filename=path, comment=self.summary, ignore_warnings=ignore)
and the original file? That could help in reproducing the issue.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, valhallasw Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 added a comment.
connect: (commons.wikimedia.org, 443) send: u'GET /w/api.php?maxlag=5&format=json&rawcontinue=&meta=userinfo&action=query&uiprop=blockinfo%7Cgroups%7Chasmsg%7Crights HTTP/1.1\r\nHost: commons.wikimedia.org\r\ncontent-type: application/x-www-form-urlencoded\r\nconnection: keep-alive\r\ncookie: centralauth_Token=78b8f029c49c2f259939a0b5dadd6c8a; centralauth_User=YiFeiBot; forceHTTPS=1; commonswikiUserID=2915654; forceHTTPS=true; commonswikiUserName=YiFeiBot\r\naccept-encoding: gzip, deflate\r\nuser-agent: flickrAZ (commons:commons; User:YiFeiBot) Pywikibot/2.0b3 (g5523) httplib2/0.8 Python/2.7.6.final.0\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Server: nginx/1.6.2 header: Date: Sun, 17 May 2015 01:11:33 GMT header: Content-Type: application/json; charset=utf-8 header: Content-Length: 465 header: Connection: keep-alive header: X-Powered-By: HHVM/3.6.1 header: X-Content-Type-Options: nosniff header: Cache-control: private, must-revalidate, max-age=0 header: X-Frame-Options: SAMEORIGIN header: Content-Encoding: gzip header: Vary: Accept-Encoding header: Set-Cookie: commonswikiSession=244c342c5006ef116c903f7b68678c79; path=/; secure; httponly header: X-Varnish: 968174735, 3348394954 header: Via: 1.1 varnish, 1.1 varnish header: Accept-Ranges: bytes header: Age: 0 header: X-Cache: cp1054 miss (0), cp1068 frontend miss (0) header: Set-Cookie: GeoIP=::::v4; Path=/; Domain=.wikimedia.org header: X-Analytics: https=1 header: Set-Cookie: WMF-Last-Access=17-May-2015;Path=/;HttpOnly;Expires=Thu, 18 Jun 2015 00:00:00 GMT Page 1 Ryanair, EI-DWP, Boeing 737-8AS (17737230072).jpg send: u'GET /w/api.php?inprop=protection&titles=File%3ARyanair%2C+EI-DWP%2C+Boeing+737-8AS+%2817737230072%29.jpg&continue=&format=json&prop=info&meta=userinfo&indexpageids=&action=query&maxlag=5&uiprop=blockinfo%7Chasmsg HTTP/1.1\r\nHost: commons.wikimedia.org\r\ncontent-type: application/x-www-form-urlencoded\r\nconnection: keep-alive\r\ncookie: GeoIP=::::v4; centralauth_Token=78b8f029c49c2f259939a0b5dadd6c8a; centralauth_User=YiFeiBot; forceHTTPS=1; commonswikiSession=244c342c5006ef116c903f7b68678c79; commonswikiUserID=2915654; forceHTTPS=true; WMF-Last-Access=17-May-2015; commonswikiUserName=YiFeiBot\r\naccept-encoding: gzip, deflate\r\nuser-agent: flickrAZ (commons:commons; User:YiFeiBot) Pywikibot/2.0b3 (g5523) httplib2/0.8 Python/2.7.6.final.0\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Server: nginx/1.6.2 header: Date: Sun, 17 May 2015 01:11:33 GMT header: Content-Type: application/json; charset=utf-8 header: Content-Length: 248 header: Connection: keep-alive header: X-Powered-By: HHVM/3.6.1 header: X-Content-Type-Options: nosniff header: Cache-control: private, must-revalidate, max-age=0 header: X-Frame-Options: SAMEORIGIN header: Content-Encoding: gzip header: Vary: Accept-Encoding header: X-Varnish: 66566441, 3348396553 header: Via: 1.1 varnish, 1.1 varnish header: Accept-Ranges: bytes header: Age: 0 header: X-Cache: cp1065 miss (0), cp1068 frontend miss (0) header: X-Analytics: https=1;WMF-Last-Access=17-May-2015 send: u'GET /w/api.php?maxlag=5&format=json&rawcontinue=&meta=tokens%7Cuserinfo&action=query&type=csrf%7Ccsrf%7Cdeleteglobalaccount%7Cpatrol%7Crollback%7Csetglobalaccountstatus%7Cuserrights%7Cwatch&uiprop=blockinfo%7Chasmsg HTTP/1.1\r\nHost: commons.wikimedia.org\r\ncontent-type: application/x-www-form-urlencoded\r\nconnection: keep-alive\r\ncookie: GeoIP=::::v4; centralauth_Token=78b8f029c49c2f259939a0b5dadd6c8a; centralauth_User=YiFeiBot; forceHTTPS=1; commonswikiSession=244c342c5006ef116c903f7b68678c79; commonswikiUserID=2915654; forceHTTPS=true; WMF-Last-Access=17-May-2015; commonswikiUserName=YiFeiBot\r\naccept-encoding: gzip, deflate\r\nuser-agent: flickrAZ (commons:commons; User:YiFeiBot) Pywikibot/2.0b3 (g5523) httplib2/0.8 Python/2.7.6.final.0\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Server: nginx/1.6.2 header: Date: Sun, 17 May 2015 01:11:34 GMT header: Content-Type: application/json; charset=utf-8 header: Content-Length: 321 header: Connection: keep-alive header: X-Powered-By: HHVM/3.6.1 header: X-Content-Type-Options: nosniff header: Cache-control: private, must-revalidate, max-age=0 header: X-Frame-Options: SAMEORIGIN header: Content-Encoding: gzip header: Vary: Accept-Encoding header: X-Varnish: 968176035, 3348398412 header: Via: 1.1 varnish, 1.1 varnish header: Accept-Ranges: bytes header: Age: 0 header: X-Cache: cp1054 miss (0), cp1068 frontend miss (0) header: X-Analytics: https=1;WMF-Last-Access=17-May-2015 Sleeping for 7.5 seconds, 2015-05-17 01:11:34 ERROR: Traceback (most recent call last): File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1556, in submit body=body, headers=headers) File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 1094, in wrapper return obj(*__args, **__kw) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 279, in request r = fetch(baseuri, method, body, headers, **kwargs) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 381, in fetch error_handling_callback(request) File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 297, in error_handling_callback raise request.data UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1850: ordinal not in range(128)
I believe the UnicodeDecodeError happened before the upload request is really sent.
The exact values is:
site.upload(pywikibot.FilePage(site, "File:Ryanair, EI-DWP, Boeing 737-8AS (17737230072).jpg"), source_filename="temp/fli.tmp", comment="[[Commons:Bots|Bot]]: Uploading files from Flickr", ignore_warnings=False)
git diff shows nothing and git status shows HEAD detached at https://phabricator.wikimedia.org/rPWBCd798da349087d231d9dad203cd6b48a1aea03...
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
valhallasw added a comment.
I tried the following...
# this image is https://flic.kr/p/t2nWxA wget https://farm9.staticflickr.com/8880/17737230072_d678b9f089_o_d.jpg
# then python pwb.py shell
and then in the python shell
import pywikibot site = pywikibot.Site('commons', 'commons') site.login() # site.upload does not do this itself for some reason site.upload(pywikibot.FilePage(site, "File:Ryanair, EI-DWP, Boeing 737-8AS (17737230072).jpg"), source_filename="17737230072_d678b9f089_o_d.jpg", comment="Testing [[phab:T99366]] / [[Commons:Bots|Bot]]: Uploading files from Flickr", ignore_warnings=False)
which allows me to reproduce the issue -- in Python 2, at least. In Python 3, the upload magically works: https://commons.wikimedia.org/wiki/File:Ryanair,_EI-DWP,_Boeing_737-8AS_(177...
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, valhallasw Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
valhallasw added a comment.
There are several issues. The requests has several unicode parts, which causes the entire request to be cast to a unicode string. Of course, this fails once we want to send binary data (namely, the file contents). At least the following ones are wrong:
- **u'POST /w/api.php HTTP/1.1'** - 'Host: commons.wikimedia.org' - 'Content-Length: 743405' - 'accept-encoding: gzip, deflate' - **u'user-agent: test (commons:commons; User:Valhallasw) Pywikibot/2.0b3 (g5524) httplib2/0.9 Python/2.7.6.final.0'** - **u'connection: keep-alive'** - 'cookie: (...)' - 'content-type: multipart/form-data; boundary="===============1715999916774184062=="' - 'mime-version: 1.0'
and this may have something to do with the introduction of unicode_literals at some place.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, valhallasw Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
XZise added a comment.
I'll take a closer look at this later but it seems that we should encode all header entries to ASCII before we are sending it away. I guess in Python 3 httplib2 is doing that for us.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, XZise Cc: valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
gerritbot added a subscriber: gerritbot. gerritbot added a comment.
Change 211522 had a related patch set uploaded (by Merlijn van Deen): Make sure all comms code works with native str
https://gerrit.wikimedia.org/r/211522
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, gerritbot Cc: gerritbot, valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
gerritbot added a project: Patch-For-Review.
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999, gerritbot Cc: gerritbot, valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
zhuyifei1999 added a comment.
Any updates?
TASK DETAIL https://phabricator.wikimedia.org/T99366
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: zhuyifei1999 Cc: gerritbot, valhallasw, XZise, Aklapper, zhuyifei1999, jayvdb, pywikipedia-bugs
pywikipedia-bugs@lists.wikimedia.org