-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
jdd wrote:
Fernando Correia wrote:
Windows doesn't have any problem handling
special characters in file names.
wrong.
Windows have many problems, using special codes for some
characters, as do joliet cd/dvd system, this is easy to see
when reading from windows any file written under strictly
utf8 compliant unix system
(If you configure your mount options properly on the Unix/Linux side you
won't have that problem!)
The problem is that Windows has a kind of weird schizophrenic approach
to character sets.
Part of the system works in pure, total Unicode, speaking and storing
UTF-16 everywhere. This is the Unicode or "wide character" interface.
Part of the system works in a language- or system-dependent second
encoding which may be 8-bit or variable length. This is the (not very
accurately named) "ANSI" interface.
(And then just to be a jerk, part of the system works in *another*
language- or system-dependent *third* encoding, 8-bit or variable
length, which is the "OEM" charset. This is used in console-mode
terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
Now, for better or for worse, if you use the (Unix-derived) C standard
library, like most ports of Unix apps probably do, it seems to prefer
using the ANSI (or maybe OEM?) encoding of things.
MediaWiki generally assumes you're running on a modern Unix and speaks
UTF-8 everywhere, including with the filesystem. That assumption breaks
on Windows, where filenames on the filesystem *as seen from PHP* are
accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode
translation layer.
This means you basically get gibberish, since MediaWiki and the web
server see different versions of the filename.
A planned change to the file storage scheme will make this issue
obsolete as file storage will be done with nice, ASCII-clean
alphanumeric hash keys, but that might be another major version or two
before it gets done.
If someone happens to know a convenient way to tell the system "my
process speaks UTF-8, let me use the damn Unicode filenames" that'd be
super. Otherwise... hack in a check for non-ASCII chars? *shrug*
- -- brion vibber (brion @
pobox.com / brion @
wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iD8DBQFFsQUswRnhpk1wk44RAu+nAJ9Ph4Pd2hTejpMmRrrYUU21WBjJBQCeLK43
m9V/59LLt+dA+oMfftRGyWg=
=ZfNo
-----END PGP SIGNATURE-----