https://bugzilla.wikimedia.org/show_bug.cgi?id=73661
Bug ID: 73661
Summary: Uploads don't allow non-ASCII characters in filename
Product: Pywikibot
Version: core-(2.0)
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General
Assignee: Pywikipedia-bugs(a)lists.wikimedia.org
Reporter: CommodoreFabianus(a)gmx.de
Web browser: ---
Mobile Platform: ---
Depending on the used version either the original file may not contain
non-ASCII characters or the target page name on the wiki. This was changed in
Ib751ee3f4074a60f3b53b0afe3cc2dfc3e17b2f7 in pwb 2.0 so versions prior to that
won't work with non-ASCII local filenames and versions with that won't work
with non-ASCII wiki page names.
The problem is simply that the 'filename'-value in the header of the file/chunk
entry (not to be confused with the 'filename' entry in the MIME request). For
example:
Content-Type: image/jpeg
MIME-Version: 1.0
Content-disposition: form-data; name="file";
filename*=utf-8''%C3%9C.jpg
Content-Transfer-Encoding: binary
[… binary data …]
This would be the RFC2231 compliant encoding of a non-ASCII character, which
would be used by default in Python 3. Python 2 instead does a strange encoding
of the complete line (this may not represent the same text as above but
similar):
Content-disposition: =?utf-8?b?Zm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9?=
=?utf-8?b?IsOcMi5qcGci?=
Both are not accepted by the MediaWiki server and are answered with:
badupload_file: File upload param file is not a file upload; be sure to use
multipart/form-data for your POST and include a filename in the
Content-Disposition header.
Or Python 2:
missingparam: One of the parameters filekey, file, url, statuskey is required
It is possible to leave it UTF8 encoded although that is (afaics) not compliant
with the RFCs related to MIME which say that the header may only contain
US-ASCII characters.
Unfortunately I'm not sure what mediawiki does with this so I don't if there is
a better way, especially as Python 3 doesn't support 'bytes' in the header
and
otherwise it's not possible to get the value not reencoded there.
--
You are receiving this mail because:
You are the assignee for the bug.