[Mediawiki-l] Special characters on uploaded files

Stuardo Herrera stuardoherrera at gmail.com
Fri Jan 19 18:07:51 UTC 2007


Oh well, I'll think in a hack then. Meanwhile I hope all my users read the
"don't upload special chars" message. Thanks to everyone that helped!

2007/1/19, Brion Vibber <brion at pobox.com>:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> jdd wrote:
> > Fernando Correia wrote:
> >> Windows doesn't have any problem handling special characters in file
> names.
> >
> > wrong.
> >
> > Windows have many problems, using special codes for some
> > characters, as do joliet cd/dvd system, this is easy to see
> > when reading from windows any file written under strictly
> > utf8 compliant unix system
>
> (If you configure your mount options properly on the Unix/Linux side you
> won't have that problem!)
>
>
> The problem is that Windows has a kind of weird schizophrenic approach
> to character sets.
>
> Part of the system works in pure, total Unicode, speaking and storing
> UTF-16 everywhere. This is the Unicode or "wide character" interface.
>
> Part of the system works in a language- or system-dependent second
> encoding which may be 8-bit or variable length. This is the (not very
> accurately named) "ANSI" interface.
>
> (And then just to be a jerk, part of the system works in *another*
> language- or system-dependent *third* encoding, 8-bit or variable
> length, which is the "OEM" charset. This is used in console-mode
> terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
>
>
> Now, for better or for worse, if you use the (Unix-derived) C standard
> library, like most ports of Unix apps probably do, it seems to prefer
> using the ANSI (or maybe OEM?) encoding of things.
>
> MediaWiki generally assumes you're running on a modern Unix and speaks
> UTF-8 everywhere, including with the filesystem. That assumption breaks
> on Windows, where filenames on the filesystem *as seen from PHP* are
> accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode
> translation layer.
>
> This means you basically get gibberish, since MediaWiki and the web
> server see different versions of the filename.
>
>
> A planned change to the file storage scheme will make this issue
> obsolete as file storage will be done with nice, ASCII-clean
> alphanumeric hash keys, but that might be another major version or two
> before it gets done.
>
>
> If someone happens to know a convenient way to tell the system "my
> process speaks UTF-8, let me use the damn Unicode filenames" that'd be
> super. Otherwise... hack in a check for non-ASCII chars? *shrug*
>
> - -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFsQUswRnhpk1wk44RAu+nAJ9Ph4Pd2hTejpMmRrrYUU21WBjJBQCeLK43
> m9V/59LLt+dA+oMfftRGyWg=
> =ZfNo
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l at lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>



-- 
:::Stuardo Herrera:::
http://stuardo.wordpress.com
http://php.develsystems.com


More information about the MediaWiki-l mailing list