Dear All,
Since switching de.wikipedia to UTF-8 a problem with image uploads
from Mac OS X surfaced. The Mac OS X filesystem, IMHO contrary to
most other ones, stores the filenames in decomposed Unicode.
As an example look at a file named "ü" (small u-umlaut):
Normal use is the Unicode codepoint:
U+00FC LATIN SMALL LETTER U WITH DIAERESIS
This is
0xC3 0xBC in UTF-8
"%C3%BC" as URL
Mac OS X use is this Unicode copdepoint sequence:
U+0075 LATIN SMALL LETTER U
U+0308 COMBINING DIAERESIS
0x75 0xCC 0x88 in UTF-8
"u%CC%88" as URL
Now uploading images from a Mac will give them the unusual
filenames, which cannot be spotted by sight in most cases.
Using the images will fail, if you simply type image name
as you see it. Copy'n'paste the "Mac" umlaut filename will
work but will make everything even less transparent.
Example image:
http://de.wikipedia.org/wiki/Bild:To%CC%88o%CC%88a%CC%88a%CC%88st.jpg
German discussion Page:
de:Wikipedia_Diskussion:UTF8-Probleme#Umlaute_in_Upload_Dateinamen_bei_Mac_OS_X
The most foolproof solutions seems to me, to normalize incoming
filenames to NFC (composed) form by the server software.
Regards,
Peter Jacobi