On Mon, Jul 20, 2009 at 6:20 AM, Dmitriy Sintsovquestpc@rambler.ru wrote:
I am not sure that the underscore is the most suitable character, because in MediaWiki it's interchangable with the space character. The type of the document should be determined by it's mime-type. If Google uses the web path "extension" (which is meaningless by the way, because that's a virtual path) instead of mime-type to determine whether the page should be indexed, that's amazing bug for Google.
Maybe they don't retrieve the page in the first place, because they don't want to waste bandwidth and processing time getting images. It would be rather a waste to send dozens or hundreds of HEAD requests on every Flickr page (or whatever) just to make sure that all those things ending in a suffix universally accepted to designate images really *are* images.
On Mon, Jul 20, 2009 at 9:45 AM, Nikola Smolenskismolensk@eunet.yu wrote:
It's a necessary evil however, because of a number of servers that serve incorrect mime types.
Well, that would make no difference if you actually downloaded the content, or the first handful of bytes. It's easy to *very* reliably distinguish binary image data from HTML if you get to look at the first several bytes of the file.
Anyway, I think the "right" way to do this would be to omit the suffix from the page name entirely, treating the format as an implementation detail. That way you could, for instance, upload an SVG over a PNG or a PNG over a JPEG, and have all users be automatically updated without manually changing the references. This does get a little confusing when you consider totally different types of media, though, like audio or video or PDF or whatnot. If NS_FILE (NS_IMAGE) weren't hardcoded in thirty million places both in code and templates, I might suggest different namespaces for different media types instead of one unified File: namespace, but that seems impractical at this point.