Re: [Wikitech-l] commons.wikimedia.org allowing directory indexes and web robots

20 Jul 2009


      On Mon, Jul 20, 2009 at 6:20 AM, Dmitriy Sintsovquestpc@rambler.ru wrote:
...
I am not sure that the underscore is the most suitable character,
because in MediaWiki it's interchangable with the space character. The
type of the document should be determined by it's mime-type. If Google
uses the web path "extension" (which is meaningless by the way, because
that's a virtual path) instead of mime-type to determine whether the
page should be indexed, that's amazing bug for Google.
Maybe they don't retrieve the page in the first place, because they
don't want to waste bandwidth and processing time getting images.  It
would be rather a waste to send dozens or hundreds of HEAD requests on
every Flickr page (or whatever) just to make sure that all those
things ending in a suffix universally accepted to designate images
really *are* images.
On Mon, Jul 20, 2009 at 9:45 AM, Nikola Smolenskismolensk@eunet.yu wrote:
...
It's a necessary evil however, because of a number of servers that serve
incorrect mime types.
Well, that would make no difference if you actually downloaded the
content, or the first handful of bytes.  It's easy to *very* reliably
distinguish binary image data from HTML if you get to look at the
first several bytes of the file.
Anyway, I think the "right" way to do this would be to omit the suffix
from the page name entirely, treating the format as an implementation
detail.  That way you could, for instance, upload an SVG over a PNG or
a PNG over a JPEG, and have all users be automatically updated without
manually changing the references.  This does get a little confusing
when you consider totally different types of media, though, like audio
or video or PDF or whatnot.  If NS_FILE (NS_IMAGE) weren't hardcoded
in thirty million places both in code and templates, I might suggest
different namespaces for different media types instead of one unified
File: namespace, but that seems impractical at this point.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] commons.wikimedia.org allowing directory indexes and web robots