[Mediawiki-l] Special characters on uploaded files

Stuardo Herrera stuardoherrera at gmail.com
Fri Jan 19 18:07:51 UTC 2007

Oh well, I'll think in a hack then. Meanwhile I hope all my users read the
"don't upload special chars" message. Thanks to everyone that helped!

2007/1/19, Brion Vibber <brion at pobox.com>:
> Hash: SHA1
> jdd wrote:
> > Fernando Correia wrote:
> >> Windows doesn't have any problem handling special characters in file
> names.
> >
> > wrong.
> >
> > Windows have many problems, using special codes for some
> > characters, as do joliet cd/dvd system, this is easy to see
> > when reading from windows any file written under strictly
> > utf8 compliant unix system
> (If you configure your mount options properly on the Unix/Linux side you
> won't have that problem!)
> The problem is that Windows has a kind of weird schizophrenic approach
> to character sets.
> Part of the system works in pure, total Unicode, speaking and storing
> UTF-16 everywhere. This is the Unicode or "wide character" interface.
> Part of the system works in a language- or system-dependent second
> encoding which may be 8-bit or variable length. This is the (not very
> accurately named) "ANSI" interface.
> (And then just to be a jerk, part of the system works in *another*
> language- or system-dependent *third* encoding, 8-bit or variable
> length, which is the "OEM" charset. This is used in console-mode
> terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
> Now, for better or for worse, if you use the (Unix-derived) C standard
> library, like most ports of Unix apps probably do, it seems to prefer
> using the ANSI (or maybe OEM?) encoding of things.
> MediaWiki generally assumes you're running on a modern Unix and speaks
> UTF-8 everywhere, including with the filesystem. That assumption breaks
> on Windows, where filenames on the filesystem *as seen from PHP* are
> accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode
> translation layer.
> This means you basically get gibberish, since MediaWiki and the web
> server see different versions of the filename.
> A planned change to the file storage scheme will make this issue
> obsolete as file storage will be done with nice, ASCII-clean
> alphanumeric hash keys, but that might be another major version or two
> before it gets done.
> If someone happens to know a convenient way to tell the system "my
> process speaks UTF-8, let me use the damn Unicode filenames" that'd be
> super. Otherwise... hack in a check for non-ASCII chars? *shrug*
> - -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
> Version: GnuPG v1.4.2.2 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> iD8DBQFFsQUswRnhpk1wk44RAu+nAJ9Ph4Pd2hTejpMmRrrYUU21WBjJBQCeLK43
> m9V/59LLt+dA+oMfftRGyWg=
> =ZfNo
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l at lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

:::Stuardo Herrera:::

More information about the MediaWiki-l mailing list