robots.txt

List overview All Threads
Download

newer

older

Is it possible to instal MediaWiki...

generated links point to...

Michael

26 Oct 2004 26 Oct '04

7:04 p.m.

It might be useful to include a robots.txt file that'll tell search spiders not to bother with any of the active pages such as 'Edit'. While it isn't hard to make this kind of file it could be useful to include it for the sake of giving people a starting place. As most wiki's would have the same robots.txt file anyway. On most of my sites anyway they get hit by spiders several times a day so keeping spiders from wasting time on pages they don't need to index minimizes the wasted server time.

-- Michael mogmios@mlug.missouri.edu http://kavlon.org

Show replies by date

Brion Vibber

26 Oct 26 Oct

7:30 p.m.

New subject: [Mediawiki-l] robots.txt

On Oct 26, 2004, at 4:04 PM, Michael wrote:

...

It might be useful to include a robots.txt file that'll tell search spiders not to bother with any of the active pages such as 'Edit'.

This will be dependent on your server configuration. Note that robots.txt works on URL prefixes, so you need a reliable way of distinguishing plain view hits from other URLs.

(The meta tags already tell search engines not to index edit pages and other special pages, and not to continue spidering from them, but won't prevent the initial hit to load that page.)

-- brion vibber (brion @ pobox.com)

Michael

7:37 p.m.

New subject: [Mediawiki-l] robots.txt

...

This will be dependent on your server configuration. Note that robots.txt works on URL prefixes, so you need a reliable way of distinguishing plain view hits from other URLs.

I'd just make the example work if the wiki is in the root folder for the site. That'd be enough to give most people a starting place if nothing else. And I think you can distinguish the ones that need to be ignored by the '?' in the URL. Are there any pages that should be ignored that don't have the '?'?

...

(The meta tags already tell search engines not to index edit pages and other special pages, and not to continue spidering from them, but won't prevent the initial hit to load that page.)

Many search engines ignore those meta tags.

-- Michael mogmios@mlug.missouri.edu http://kavlon.org

Brion Vibber

7:40 p.m.

New subject: [Mediawiki-l] robots.txt

On Oct 26, 2004, at 4:37 PM, Michael wrote:

...

...
This will be dependent on your server configuration. Note that robots.txt works on URL prefixes, so you need a reliable way of distinguishing plain view hits from other URLs.

I'd just make the example work if the wiki is in the root folder for the site. That'd be enough to give most people a starting place if nothing else. And I think you can distinguish the ones that need to be ignored by the '?' in the URL. Are there any pages that should be ignored that don't have the '?'?

Every single page will have a ? if you're running without PATH_INFO support or rewrite rules.

...

...
(The meta tags already tell search engines not to index edit pages and other special pages, and not to continue spidering from them, but won't prevent the initial hit to load that page.)

Many search engines ignore those meta tags.

Such as?

-- brion vibber (brion @ pobox.com)

7361

Age (days ago)

7361

Last active (days ago)

mediawiki-l@lists.wikimedia.org

3 comments

2 participants

tags (0)

participants (2)

Brion Vibber
Michael