On Wed, Jul 9, 2008 at 5:11 PM, vasilievvv@svn.wikimedia.org wrote:
Log Message:
- Forbid files with * and ? to be uploaded under Windows (it caused internal errors since such characters are illegal there)
It seems like it would be a better idea to be consistent across platforms here. Otherwise you're just going to cause trouble for portability; for instance, Windows users would be unable to easily use an image dump from Wikimedia, or other Unix-based MediaWiki installations.
However, we don't *really* have to use the same name in the filesystem as we use as a title. This seems to me like it would be better implemented by mangling the filename somehow. The invalid Windows/DOS characters are supposedly:
? [ ] / \ = + < > : ; " ,
Of those, I think the following are currently legal in image names (before your commit):
? \ = + : ; " ,
Each of these could be replaced in the filesystem by some character that Windows will accept, or some combination of them, which are invalid image names anyway. For instance, you could replace them with {question} {backslash} {equals} {plus} {colon} {semicolon} {quote} {comma}; these will work correctly because {} are illegal in page titles but legal in Windows filenames. (But they could send filenames over the file length limit, so more creative substitutes might be a better idea.) This way the rules for image titles remain unchanged, which is nice because a lot of those characters are quite handy to have in titles.
(Googled sources actually conflict as to the exact list of prohibited characters. Some say * is prohibited, some don't mention it. Same for |. ^ is apparently supposed to be illegal in FAT, according to one source, and there are other restrictions, like no trailing space or period, and a list of reserved names like "com1" and "nul". Probably it varies across different versions, but it's a lot bigger than just ? and *, anyway.)
- Forbid files to be moved to invalid filenames
This might be more cleanly implemented by making invalid filenames invalid titles in the Image namespace. That would make things somewhat simpler by keeping things in more expected places. It also makes sense to prohibit image pages from existing when it's not possible for an image of that title to exist. (But projects will need to be checked for pages that will become invalid under this scheme, of course, perhaps using a maintenance script.)
+/**
- Checks filename for validity
- @param mixed $title Filename or title to check
- */
+function wfIsValidFileName( $name ) {
Surely this shouldn't be a global function, but a static method of something? Or even a non-static method of something.
elseif( wfIsWindows() && ( in_string( '*', $name ) || in_string( '?', $name ) ) )
return false;
. . .
if( wfIsWindows() )
$filtered = preg_replace ( "/[*?]/", '-', $filtered );
Magic constants here. You have a list of blacklisted characters scattered across multiple files, that's bad. They could become inconsistent over time.
Simetrical wrote:
However, we don't *really* have to use the same name in the filesystem as we use as a title. This seems to me like it would be better implemented by mangling the filename somehow. The invalid Windows/DOS characters are supposedly:
? [ ] / \ = + < > : ; " ,
Of those, I think the following are currently legal in image names (before your commit):
? \ = + : ; " ,
'' would be excluded by wfBaseName(), and ':' is explicitly stripped in UploadForm::internalProcessUpload().
The others may have been allowed previously, though at least ?, ;, and " seem unwise. :)
My recommendation is to ditch the use of raw filesystem filenames -- which already fail on Windows due to the weird charset encoding system breaking any non-ASCII characters -- and allow media files to have any name in the database, while they're stored with a nice clean content hash on the filesystem (when a filesystem is used at all as backend).
This has been planned for a long time, but implementation has gotten stalled while other things get done. (Though we already store deleted files in this way, and it works pretty well.)
-- brion
Simetrical wrote:
However, we don't *really* have to use the same name in the filesystem as we use as a title. This seems to me like it would be better implemented by mangling the filename somehow. The invalid Windows/DOS characters are supposedly:
? [ ] / \ = + < > : ; " ,
[]=+;, are legal on windows.
Of those, I think the following are currently legal in image names (before your commit):
? \ = + : ; " ,
(Googled sources actually conflict as to the exact list of prohibited characters. Some say * is prohibited, some don't mention it.
It is for being a wildcard.
Same for |.
It is for being the pipe character.
^ is apparently supposed to be illegal in FAT, according to one source,
It is an escape character for windows shell, but legal in fat. Perhaps only legal in vfat?
and there are other restrictions, like no trailing space or period,
hmm, right. Although not really applicable for images which will have an extension appended.
and a list of reserved names like "com1" and "nul".
Strangely, not only are com1 and nul prohibited, but also nul.png or com1.jpg
Probably it varies across different versions, but it's a lot bigger than just ? and *, anyway.)
- Forbid files to be moved to invalid filenames
Instead of checking the filename against a list of bad characters, why not try to actually do it, and abort the rename if it can't be done? That way no special case will be missing, and it won't be that frequent anyway. YOu only need to avoid the slashes / \ (and : if wgUploadDir can be "")
On Thu, Jul 10, 2008 at 4:34 PM, Platonides Platonides@gmail.com wrote:
Simetrical wrote:
However, we don't *really* have to use the same name in the filesystem as we use as a title. This seems to me like it would be better implemented by mangling the filename somehow. The invalid Windows/DOS characters are supposedly:
? [ ] / \ = + < > : ; " ,
[]=+;, are legal on windows.
Of those, I think the following are currently legal in image names (before your commit):
? \ = + : ; " ,
(Googled sources actually conflict as to the exact list of prohibited characters. Some say * is prohibited, some don't mention it.
It is for being a wildcard.
Same for |.
It is for being the pipe character.
Regardless of it being allowed or not, it sound like a bad idea to me to allow ?*| in filenames.
and there are other restrictions, like no trailing space or period,
hmm, right. Although not really applicable for images which will have an extension appended.
But it still is something which wfStripIllegalFilenameChars should catch.
and a list of reserved names like "com1" and "nul".
Strangely, not only are com1 and nul prohibited, but also nul.png or com1.jpg
Uh... ok... so basically MediaWiki installations having com1.png files are not platform compatible?
On Thu, Jul 10, 2008 at 10:45 AM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
Regardless of it being allowed or not, it sound like a bad idea to me to allow ?*| in filenames.
By that logic, shouldn't we ban ();!$ and so on for being shell characters as well? If people want to move files around manually, and want to use a command line instead of a GUI or a (non-shell) script, they can be careful with their escaping. But as Brion says, the plan is to eventually move to hash-based filenames anyway.
Uh... ok... so basically MediaWiki installations having com1.png files are not platform compatible?
Apparently, yeah . . . Unix's "everything but / or null" seems a lot more convenient here. Although it's kind of a pain when you get some unprintable binary gibberish for a filename by mistake. :)
Bryan Tong Minh wrote:
Uh... ok... so basically MediaWiki installations having com1.png files are not platform compatible?
Testing on my own wiki (running on Windows), trying to upload a file with a correct local name, but having MediaWiki change the name to Com1.png on upload causes an internal error: Could not rename file "public/e/ee/Com1.png" to "public/archive/e/ee/20080711003730!Com1.png".
Uploading to a server not running Windows, it works fine, the only thing you can't do is save the image from the site to a Windows PC without changing the name before saving. http://test.wikipedia.org/wiki/Image:Com1.png
On Thu, Jul 10, 2008 at 12:06 AM, Simetrical Simetrical+wikilist@gmail.com wrote:
However, we don't *really* have to use the same name in the filesystem as we use as a title. This seems to me like it would be better implemented by mangling the filename somehow. The invalid Windows/DOS characters are supposedly:
? [ ] / \ = + < > : ; " ,
Of those, I think the following are currently legal in image names (before your commit):
? \ = + : ; " ,
I was just reading the FAT specification and +,;=[] are valid characters for FAT drivers that support LFN (Long File Names) which is basically everything starting from Windows 95.
To be exact, under FAT a file name is allowed to contain any letters, digits or characters with code point above 127. Also the following characters are allowed: $%'-_@~`!(){}^#& For Windows 95 and above the characters mentioned above are allowed as well.
I don't know about NTFS though.
Bryan
In Posix namespace, any UTF-16 code unit (case sensitive) except U+0000 (NUL) and / (slash). In Win32 namespace, any UTF-16 code unit (case insensitive) except U+0000 (NUL) / (slash) \ (backslash) : (colon) * (asterisk) ? (Question mark) " (quote) < (less than) > (greater than) and | (pipe) [1,2]
Cheers! Siebrand
[1] http://en.wikipedia.org/wiki/NTFS [2] http://data.linux-ntfs.org/ntfsdoc.html.gz
-----Oorspronkelijk bericht----- Van: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] Namens Bryan Tong Minh Verzonden: zondag 13 juli 2008 19:38 Aan: Wikimedia developers Onderwerp: Re: [Wikitech-l] [MediaWiki-CVS] SVN: [37443] trunk/phase3
I was just reading the FAT specification and +,;=[] are valid characters for FAT drivers that support LFN (Long File Names) which is basically everything starting from Windows 95.
To be exact, under FAT a file name is allowed to contain any letters, digits or characters with code point above 127. Also the following characters are allowed: $%'-_@~`!(){}^#& For Windows 95 and above the characters mentioned above are allowed as well.
I don't know about NTFS though.
wikitech-l@lists.wikimedia.org