Dear All,
Since switching de.wikipedia to UTF-8 a problem with image uploads from Mac OS X surfaced. The Mac OS X filesystem, IMHO contrary to most other ones, stores the filenames in decomposed Unicode.
As an example look at a file named "ü" (small u-umlaut):
Normal use is the Unicode codepoint: U+00FC LATIN SMALL LETTER U WITH DIAERESIS
This is 0xC3 0xBC in UTF-8 "%C3%BC" as URL
Mac OS X use is this Unicode copdepoint sequence: U+0075 LATIN SMALL LETTER U U+0308 COMBINING DIAERESIS 0x75 0xCC 0x88 in UTF-8 "u%CC%88" as URL
Now uploading images from a Mac will give them the unusual filenames, which cannot be spotted by sight in most cases. Using the images will fail, if you simply type image name as you see it. Copy'n'paste the "Mac" umlaut filename will work but will make everything even less transparent.
Example image: http://de.wikipedia.org/wiki/Bild:To%CC%88o%CC%88a%CC%88a%CC%88st.jpg
German discussion Page: de:Wikipedia_Diskussion:UTF8-Probleme#Umlaute_in_Upload_Dateinamen_bei_Mac_OS_X
The most foolproof solutions seems to me, to normalize incoming filenames to NFC (composed) form by the server software.
Regards, Peter Jacobi
hello peter.
Please see http://bugzilla.wikipedia.org/show_bug.cgi?id=215, which i filed earlier today. If you can contribute more details, please do it there.
daniel
Peter Jacobi wrote:
Since switching de.wikipedia to UTF-8 a problem with image uploads from Mac OS X surfaced. The Mac OS X filesystem, IMHO contrary to most other ones, stores the filenames in decomposed Unicode.
This seems to be a problem specific to how Safari handles the names on upload; other browsers normalise the name properly when sending, but of course we should be more careful with input.
Follow-ups to http://bugzilla.wikipedia.org/215 please.
-- brion vibber (brion @ pobox.com)
Does it work when using UFS as opposed to HFS+ ?
On Wed, 25 Aug 2004 08:13:10 +0000 (UTC), Peter Jacobi peter_jacobi@gmx.net wrote:
Dear All,
Since switching de.wikipedia to UTF-8 a problem with image uploads from Mac OS X surfaced. The Mac OS X filesystem, IMHO contrary to most other ones, stores the filenames in decomposed Unicode.
As an example look at a file named "ü" (small u-umlaut):
Normal use is the Unicode codepoint: U+00FC LATIN SMALL LETTER U WITH DIAERESIS
This is 0xC3 0xBC in UTF-8 "%C3%BC" as URL
Mac OS X use is this Unicode copdepoint sequence: U+0075 LATIN SMALL LETTER U U+0308 COMBINING DIAERESIS 0x75 0xCC 0x88 in UTF-8 "u%CC%88" as URL
Now uploading images from a Mac will give them the unusual filenames, which cannot be spotted by sight in most cases. Using the images will fail, if you simply type image name as you see it. Copy'n'paste the "Mac" umlaut filename will work but will make everything even less transparent.
Example image: http://de.wikipedia.org/wiki/Bild:To%CC%88o%CC%88a%CC%88a%CC%88st.jpg
German discussion Page: de:Wikipedia_Diskussion:UTF8-Probleme#Umlaute_in_Upload_Dateinamen_bei_Mac_OS_X
The most foolproof solutions seems to me, to normalize incoming filenames to NFC (composed) form by the server software.
Regards, Peter Jacobi
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org