-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
jdd wrote:
Fernando Correia wrote:
Windows doesn't have any problem handling special characters in file names.
wrong.
Windows have many problems, using special codes for some characters, as do joliet cd/dvd system, this is easy to see when reading from windows any file written under strictly utf8 compliant unix system
(If you configure your mount options properly on the Unix/Linux side you won't have that problem!)
The problem is that Windows has a kind of weird schizophrenic approach to character sets.
Part of the system works in pure, total Unicode, speaking and storing UTF-16 everywhere. This is the Unicode or "wide character" interface.
Part of the system works in a language- or system-dependent second encoding which may be 8-bit or variable length. This is the (not very accurately named) "ANSI" interface.
(And then just to be a jerk, part of the system works in *another* language- or system-dependent *third* encoding, 8-bit or variable length, which is the "OEM" charset. This is used in console-mode terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
Now, for better or for worse, if you use the (Unix-derived) C standard library, like most ports of Unix apps probably do, it seems to prefer using the ANSI (or maybe OEM?) encoding of things.
MediaWiki generally assumes you're running on a modern Unix and speaks UTF-8 everywhere, including with the filesystem. That assumption breaks on Windows, where filenames on the filesystem *as seen from PHP* are accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode translation layer.
This means you basically get gibberish, since MediaWiki and the web server see different versions of the filename.
A planned change to the file storage scheme will make this issue obsolete as file storage will be done with nice, ASCII-clean alphanumeric hash keys, but that might be another major version or two before it gets done.
If someone happens to know a convenient way to tell the system "my process speaks UTF-8, let me use the damn Unicode filenames" that'd be super. Otherwise... hack in a check for non-ASCII chars? *shrug*
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)