Hello everyone!
I'm uploading some pdf files that have some special characters in their names to my mediawiki site, but when I try to open them, I get to a Not Found page. I have noticed that when I upload them, their names change in the server to other characters where the special chars are and that's why mediawiki can't find them. Is their a way or an extension to avoid this? Thank you!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Stuardo Herrera wrote:
I'm uploading some pdf files that have some special characters in their names to my mediawiki site, but when I try to open them, I get to a Not Found page. I have noticed that when I upload them, their names change in the server to other characters where the special chars are and that's why mediawiki can't find them. Is their a way or an extension to avoid this? Thank you!
Is your server running on Microsoft Windows? In this case file uploads with non-ASCII characters will not work correctly.
On other operating systems you should not have this difficulty, so more details would be welcome.
- -- brion vibber (brion @ pobox.com)
Thanks Brion. Yep, IIS 6 on Windows 2003 Server. Do I have a solution? or should I just tell my users to not upload files with names with special chars? Thank you again :)
2007/1/18, Brion Vibber brion@pobox.com:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Stuardo Herrera wrote:
I'm uploading some pdf files that have some special characters in their names to my mediawiki site, but when I try to open them, I get to a Not Found page. I have noticed that when I upload them, their names change
in
the server to other characters where the special chars are and that's
why
mediawiki can't find them. Is their a way or an extension to avoid this? Thank you!
Is your server running on Microsoft Windows? In this case file uploads with non-ASCII characters will not work correctly.
On other operating systems you should not have this difficulty, so more details would be welcome.
- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFr/E8wRnhpk1wk44RAn44AJsHYOjk4RGxu6tFdSZVWDp4oJSMrQCg0dao Y4nnFsPXsfB2mLNkYb4qlzs= =y298 -----END PGP SIGNATURE-----
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
I also had this issue. For now I put a warning on the upload page not to use accented characters.
This is not a very good solution...
Windows doesn't have any problem handling special characters in file names. But either MediaWiki or PHP are mangling the file name somehow before writing it. And the file name mangling used to create the file on the file system is different from the one used later on the URL. That's why the file is not found.
Possible solutions would be:
a) To discover who is changing the file name (MediaWiki or PHP) and try to disable it. b) To put some code on the upload file form that, when running under Windows, would suggest a safe file name.
I might try to do this in the future if there is a chance the patch would be accepted for MediaWiki.
Fernando Correia wrote:
Windows doesn't have any problem handling special characters in file names.
wrong.
Windows have many problems, using special codes for some characters, as do joliet cd/dvd system, this is easy to see when reading from windows any file written under strictly utf8 compliant unix system
jdd
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
jdd wrote:
Fernando Correia wrote:
Windows doesn't have any problem handling special characters in file names.
wrong.
Windows have many problems, using special codes for some characters, as do joliet cd/dvd system, this is easy to see when reading from windows any file written under strictly utf8 compliant unix system
(If you configure your mount options properly on the Unix/Linux side you won't have that problem!)
The problem is that Windows has a kind of weird schizophrenic approach to character sets.
Part of the system works in pure, total Unicode, speaking and storing UTF-16 everywhere. This is the Unicode or "wide character" interface.
Part of the system works in a language- or system-dependent second encoding which may be 8-bit or variable length. This is the (not very accurately named) "ANSI" interface.
(And then just to be a jerk, part of the system works in *another* language- or system-dependent *third* encoding, 8-bit or variable length, which is the "OEM" charset. This is used in console-mode terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
Now, for better or for worse, if you use the (Unix-derived) C standard library, like most ports of Unix apps probably do, it seems to prefer using the ANSI (or maybe OEM?) encoding of things.
MediaWiki generally assumes you're running on a modern Unix and speaks UTF-8 everywhere, including with the filesystem. That assumption breaks on Windows, where filenames on the filesystem *as seen from PHP* are accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode translation layer.
This means you basically get gibberish, since MediaWiki and the web server see different versions of the filename.
A planned change to the file storage scheme will make this issue obsolete as file storage will be done with nice, ASCII-clean alphanumeric hash keys, but that might be another major version or two before it gets done.
If someone happens to know a convenient way to tell the system "my process speaks UTF-8, let me use the damn Unicode filenames" that'd be super. Otherwise... hack in a check for non-ASCII chars? *shrug*
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
I agree that would be a terrific solution at the root of the problem. But it is a big change and may be too far in the future.
A quicker but effective solution could be some special processing on the post event of the upload file form, "cleaning" the file name. This could be conditional so it would not affect UNIX installations.
2007/1/19, Fernando Correia fernandoacorreia@gmail.com:
I agree that would be a terrific solution at the root of the problem. But it is a big change and may be too far in the future.
A quicker but effective solution could be some special processing on the post event of the upload file form, "cleaning" the file name. This could be conditional so it would not affect UNIX installations.
Brion, do you think such a patch would have a chance of being incorporated into MediaWiki?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Fernando Correia wrote:
2007/1/19, Fernando Correia fernandoacorreia@gmail.com:
I agree that would be a terrific solution at the root of the problem. But it is a big change and may be too far in the future.
A quicker but effective solution could be some special processing on the post event of the upload file form, "cleaning" the file name. This could be conditional so it would not affect UNIX installations.
Brion, do you think such a patch would have a chance of being incorporated into MediaWiki?
Could be done.
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
Oh well, I'll think in a hack then. Meanwhile I hope all my users read the "don't upload special chars" message. Thanks to everyone that helped!
2007/1/19, Brion Vibber brion@pobox.com:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
jdd wrote:
Fernando Correia wrote:
Windows doesn't have any problem handling special characters in file
names.
wrong.
Windows have many problems, using special codes for some characters, as do joliet cd/dvd system, this is easy to see when reading from windows any file written under strictly utf8 compliant unix system
(If you configure your mount options properly on the Unix/Linux side you won't have that problem!)
The problem is that Windows has a kind of weird schizophrenic approach to character sets.
Part of the system works in pure, total Unicode, speaking and storing UTF-16 everywhere. This is the Unicode or "wide character" interface.
Part of the system works in a language- or system-dependent second encoding which may be 8-bit or variable length. This is the (not very accurately named) "ANSI" interface.
(And then just to be a jerk, part of the system works in *another* language- or system-dependent *third* encoding, 8-bit or variable length, which is the "OEM" charset. This is used in console-mode terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
Now, for better or for worse, if you use the (Unix-derived) C standard library, like most ports of Unix apps probably do, it seems to prefer using the ANSI (or maybe OEM?) encoding of things.
MediaWiki generally assumes you're running on a modern Unix and speaks UTF-8 everywhere, including with the filesystem. That assumption breaks on Windows, where filenames on the filesystem *as seen from PHP* are accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode translation layer.
This means you basically get gibberish, since MediaWiki and the web server see different versions of the filename.
A planned change to the file storage scheme will make this issue obsolete as file storage will be done with nice, ASCII-clean alphanumeric hash keys, but that might be another major version or two before it gets done.
If someone happens to know a convenient way to tell the system "my process speaks UTF-8, let me use the damn Unicode filenames" that'd be super. Otherwise... hack in a check for non-ASCII chars? *shrug*
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFsQUswRnhpk1wk44RAu+nAJ9Ph4Pd2hTejpMmRrrYUU21WBjJBQCeLK43 m9V/59LLt+dA+oMfftRGyWg= =ZfNo -----END PGP SIGNATURE-----
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org