does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).
descriptions can contain important information, so this is a bit of a pity, isn't.
well, its for my site at least.
christof
Christof Damian wrote:
does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).
Probably Google is stupid enough to think '.jpg' at the end of a URL means the resource is itself an image.
"File extensions" are meaningless on URLs, and should not be relied upon. (Internet Explorer has some security problems related to this.)
-- brion vibber (brion @ pobox.com)
Quoting Brion Vibber, from the post of Sat, 02 Apr:
Christof Damian wrote:
does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).
Probably Google is stupid enough to think '.jpg' at the end of a URL means the resource is itself an image.
when you are a bot that has to slurp up millions of pages a day, it's safe to assume in 99.99% of the cases, that a jpg suffix will indeed lead you to an image. requesting that URL just to see that the header indeed gives one MIME type or the other means adding a considderable overhead. apart from mediawiki and a few rare CMS's, I'd risk a guess that practically nobody uses such suffixes in a URL..
On Apr 5, 2005 7:33 PM, Ira Abramov lists-MediaWiki-l@ira.abramov.org wrote:
Quoting Brion Vibber, from the post of Sat, 02 Apr:
Christof Damian wrote:
does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).
Probably Google is stupid enough to think '.jpg' at the end of a URL means the resource is itself an image.
when you are a bot that has to slurp up millions of pages a day, it's safe to assume in 99.99% of the cases, that a jpg suffix will indeed lead you to an image. requesting that URL just to see that the header indeed gives one MIME type or the other means adding a considderable overhead. apart from mediawiki and a few rare CMS's, I'd risk a guess that practically nobody uses such suffixes in a URL..
Isn't that what the "HEAD" action is for? And shouldn't Googlebots be also indexing images for the Google image search?
-- Jamie ------------------------------------------------------------------- http://endeavour.zapto.org/astro73/ Thank you to JosephM for inviting me to Gmail! Have lots of invites. Gmail now had 2GB.
On 5 Apr 2005, at 16:33, Ira Abramov wrote:
when you are a bot that has to slurp up millions of pages a day, it's safe to assume in 99.99% of the cases, that a jpg suffix will indeed lead you to an image. requesting that URL just to see that the header indeed gives one MIME type or the other means adding a considderable overhead.
What overhead? If you're loading it anyway, you should look at the MIME type. Otherwise, it's just lazy, sloppy programming.
A problem of greater concern is links that send an image with the proper MIME type *without* putting ".jpg" at the end of the URI. In that case, you do make more work for spiders, if you expect them to index your "hidden" images.
:::: Getting a personal computer is sorta like getting married so you'll have someone to help you with all the problems you never would have had if you had never gotten married in the first place. :::: Jan Steinman http://www.Bytesmiths.com/Item/794637
mediawiki-l@lists.wikimedia.org