google and image pages

List overview All Threads
Download

newer

older

wikifarm

Failed opening 'common_footer.inc'...

Christof Damian

2 Apr 2005 2 Apr '05

4:25 p.m.

does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).

descriptions can contain important information, so this is a bit of a pity, isn't.

well, its for my site at least.

christof

-- Christof Damian http://krass.com/ damian@krass.com

Show replies by date

Brion Vibber

2 Apr 2 Apr

4:43 p.m.

New subject: [Mediawiki-l] google and image pages

Christof Damian wrote:

...

does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).

Probably Google is stupid enough to think '.jpg' at the end of a URL means the resource is itself an image.

"File extensions" are meaningless on URLs, and should not be relied upon. (Internet Explorer has some security problems related to this.)

-- brion vibber (brion @ pobox.com)

Ira Abramov

5 Apr 5 Apr

7:33 p.m.

New subject: [Mediawiki-l] Re: google and image pages

Quoting Brion Vibber, from the post of Sat, 02 Apr:

...

Christof Damian wrote:

...
does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).

Probably Google is stupid enough to think '.jpg' at the end of a URL means the resource is itself an image.

when you are a bot that has to slurp up millions of pages a day, it's safe to assume in 99.99% of the cases, that a jpg suffix will indeed lead you to an image. requesting that URL just to see that the header indeed gives one MIME type or the other means adding a considderable overhead. apart from mediawiki and a few rare CMS's, I'd risk a guess that practically nobody uses such suffixes in a URL..

-- One of the endless Ira Abramov http://ira.abramov.org/email/

Jamie Bliss

8:32 p.m.

New subject: [Mediawiki-l] Re: google and image pages

On Apr 5, 2005 7:33 PM, Ira Abramov lists-MediaWiki-l@ira.abramov.org wrote:

...

Quoting Brion Vibber, from the post of Sat, 02 Apr:

...
Christof Damian wrote:

...
does anyone know why google doesn't index description pages of images? if i do a google search like this: "site:wikipedia.org inurl:image" i just get about 650 result, some of which are media files, but none are images (jpg/gif/png).

Probably Google is stupid enough to think '.jpg' at the end of a URL means the resource is itself an image.

when you are a bot that has to slurp up millions of pages a day, it's safe to assume in 99.99% of the cases, that a jpg suffix will indeed lead you to an image. requesting that URL just to see that the header indeed gives one MIME type or the other means adding a considderable overhead. apart from mediawiki and a few rare CMS's, I'd risk a guess that practically nobody uses such suffixes in a URL..

Isn't that what the "HEAD" action is for? And shouldn't Googlebots be also indexing images for the Google image search?

-- Jamie ------------------------------------------------------------------- http://endeavour.zapto.org/astro73/ Thank you to JosephM for inviting me to Gmail! Have lots of invites. Gmail now had 2GB.

Jan Steinman

9 Apr 9 Apr

2:05 p.m.

New subject: [Mediawiki-l] Re: google and image pages

On 5 Apr 2005, at 16:33, Ira Abramov wrote:

...

when you are a bot that has to slurp up millions of pages a day, it's safe to assume in 99.99% of the cases, that a jpg suffix will indeed lead you to an image. requesting that URL just to see that the header indeed gives one MIME type or the other means adding a considderable overhead.

What overhead? If you're loading it anyway, you should look at the MIME type. Otherwise, it's just lazy, sloppy programming.

A problem of greater concern is links that send an image with the proper MIME type *without* putting ".jpg" at the end of the URI. In that case, you do make more work for spiders, if you expect them to index your "hidden" images.

:::: Getting a personal computer is sorta like getting married so you'll have someone to help you with all the problems you never would have had if you had never gotten married in the first place. :::: Jan Steinman http://www.Bytesmiths.com/Item/794637

7211

Age (days ago)

7218

Last active (days ago)

mediawiki-l@lists.wikimedia.org

4 comments

5 participants

tags (0)

participants (5)

Brion Vibber
Christof Damian
Ira Abramov
Jamie Bliss
Jan Steinman