Oh well, I'll think in a hack then. Meanwhile I hope all my users read the
"don't upload special chars" message. Thanks to everyone that helped!
2007/1/19, Brion Vibber <brion(a)pobox.com>om>:
-----BEGIN PGP SIGNED MESSAGE-----
Fernando Correia wrote:
> Windows doesn't have any problem handling special characters in file
Windows have many problems, using special codes for some
characters, as do joliet cd/dvd system, this is easy to see
when reading from windows any file written under strictly
utf8 compliant unix system
(If you configure your mount options properly on the Unix/Linux side you
won't have that problem!)
The problem is that Windows has a kind of weird schizophrenic approach
to character sets.
Part of the system works in pure, total Unicode, speaking and storing
UTF-16 everywhere. This is the Unicode or "wide character" interface.
Part of the system works in a language- or system-dependent second
encoding which may be 8-bit or variable length. This is the (not very
accurately named) "ANSI" interface.
(And then just to be a jerk, part of the system works in *another*
language- or system-dependent *third* encoding, 8-bit or variable
length, which is the "OEM" charset. This is used in console-mode
terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
Now, for better or for worse, if you use the (Unix-derived) C standard
library, like most ports of Unix apps probably do, it seems to prefer
using the ANSI (or maybe OEM?) encoding of things.
MediaWiki generally assumes you're running on a modern Unix and speaks
UTF-8 everywhere, including with the filesystem. That assumption breaks
on Windows, where filenames on the filesystem *as seen from PHP* are
accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode
This means you basically get gibberish, since MediaWiki and the web
server see different versions of the filename.
A planned change to the file storage scheme will make this issue
obsolete as file storage will be done with nice, ASCII-clean
alphanumeric hash keys, but that might be another major version or two
before it gets done.
If someone happens to know a convenient way to tell the system "my
process speaks UTF-8, let me use the damn Unicode filenames" that'd be
super. Otherwise... hack in a check for non-ASCII chars? *shrug*
- -- brion vibber (brion @ pobox.com
/ brion @ wikimedia.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v220.127.116.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
MediaWiki-l mailing list