Oh wow!
Will Google understand this? Or do I need to submit the other page?
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Arthur Guy Sent: Tuesday, September 20, 2005 5:45 PM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Your sitemap is working perfectly; the page you see is an index page linking to the actual pages containing the sitemap. http://www.350z-tech.com/wiki/extensions/googleSiteMap.php?page=0
Arthur
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Bass, Joshua L Sent: 20 September 2005 23:34 To: mediawiki-l@Wikimedia.org Subject: [Mediawiki-l] Re: Google Sitemaps
I have tried using your google sitemaps extension, with no luck. I get the following error:
"This XML file does not appear to have any style information associated with it. The document tree is shown below."
Can you explain what I have done wrong?
http://www.350z-tech.com/wiki/extensions/googleSiteMap.php
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
You just need the first page; Google can then find the rest on its own.
The one problem you do have is that the sitemap needs to be in the root of the site i.e. http://www.350z-tech.com/wiki/googleSiteMap.php. The sitemap specifications say that it can only index pages in the same directory as the sitemap file is in; if you leave it where it is it will only index pages starting http://www.350z-tech.com/wiki/extensions/
Regards, Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Bass, Joshua L Sent: 20 September 2005 23:51 To: MediaWiki announcements and site admin list Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Oh wow!
Will Google understand this? Or do I need to submit the other page?
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Arthur Guy Sent: Tuesday, September 20, 2005 5:45 PM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Your sitemap is working perfectly; the page you see is an index page linking to the actual pages containing the sitemap. http://www.350z-tech.com/wiki/extensions/googleSiteMap.php?page=0
Arthur
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Bass, Joshua L Sent: 20 September 2005 23:34 To: mediawiki-l@Wikimedia.org Subject: [Mediawiki-l] Re: Google Sitemaps
I have tried using your google sitemaps extension, with no luck. I get the following error:
"This XML file does not appear to have any style information associated with it. The document tree is shown below."
Can you explain what I have done wrong?
http://www.350z-tech.com/wiki/extensions/googleSiteMap.php
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
Where does the sitemap specification say that Google will only index pages at or below the level of the sitemap? Even so, I have had no trouble with Google indexing the Case Wiki ( http://wiki.case.edu/stats/ ) with the sitemap file located in a subdirectory. Also, I updated the googleSiteMap script last night by adding a lot of documentation. The source code is available at http://wiki.case.edu/misc/googleSiteMap.phps .
Gregory Szorc gregory.szorc@case.edu
Arthur Guy wrote:
You just need the first page; Google can then find the rest on its own.
The one problem you do have is that the sitemap needs to be in the root of the site i.e. http://www.350z-tech.com/wiki/googleSiteMap.php. The sitemap specifications say that it can only index pages in the same directory as the sitemap file is in; if you leave it where it is it will only index pages starting http://www.350z-tech.com/wiki/extensions/
Regards, Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Bass, Joshua L Sent: 20 September 2005 23:51 To: MediaWiki announcements and site admin list Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Oh wow!
Will Google understand this? Or do I need to submit the other page?
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Arthur Guy Sent: Tuesday, September 20, 2005 5:45 PM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Your sitemap is working perfectly; the page you see is an index page linking to the actual pages containing the sitemap. http://www.350z-tech.com/wiki/extensions/googleSiteMap.php?page=0
Arthur
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Bass, Joshua L Sent: 20 September 2005 23:34 To: mediawiki-l@Wikimedia.org Subject: [Mediawiki-l] Re: Google Sitemaps
I have tried using your google sitemaps extension, with no luck. I get the following error:
"This XML file does not appear to have any style information associated with it. The document tree is shown below."
Can you explain what I have done wrong?
http://www.350z-tech.com/wiki/extensions/googleSiteMap.php
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Google will index all pages on your site that it can find. The only purpose of the sitemap is to help Google find and index them quickly and efficiently. It also helps Google to know how often the pages change, so it can schedule a re-crawl appropriately.
Brent Meshier
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Gregory Szorc Sent: Wednesday, September 21, 2005 8:33 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Re: Google Sitemaps
Where does the sitemap specification say that Google will only index pages at or below the level of the sitemap? Even so, I have had no trouble with Google indexing the Case Wiki
I am afraid you are wrong
An extract from Google's site. "You must place the completed Sitemap in the highest directory that you would like Google to crawl. Generally, this is the root directory. Google can only accept URLs contained in your Sitemap that are the same directory level or lower. For instance, if you place your Sitemap in www.example.com/foo/, Google will not be able to accept www.example.com/ as a URL in your Sitemap"
Regards, Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Brent Meshier Sent: 21 September 2005 15:08 To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Google will index all pages on your site that it can find. The only purpose of the sitemap is to help Google find and index them quickly and efficiently. It also helps Google to know how often the pages change, so it can schedule a re-crawl appropriately.
Brent Meshier
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Gregory Szorc Sent: Wednesday, September 21, 2005 8:33 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Re: Google Sitemaps
Where does the sitemap specification say that Google will only index pages at or below the level of the sitemap? Even so, I have had no trouble with Google indexing the Case Wiki
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
Wrong about what? The extract you quote from Google simply states how the URLs in your sitemap must relate to the directory level of which the sitemap is stored. Google will index all pages on your site if its in the sitemap or not. Google's FAQ summarizes what I stated it my previous email:
http://www.google.com/webmasters/sitemaps/docs/en/faq.html#s3
3. Will Google crawl and index all of the URLs in my Sitemap? "We don't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
Perhaps you need to reread my previous mail.
Brent Meshier Director IT Operations Global Transport Logistics, Inc http://www.gtlogistics.com/ "Innovative Fulfillment Solutions"
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Arthur Guy Sent: Wednesday, September 21, 2005 10:38 AM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
I am afraid you are wrong
An extract from Google's site. "You must place the completed Sitemap in the highest directory that you would like Google to crawl. Generally, this is the root directory. Google can only accept URLs contained in your Sitemap that are the same directory level or lower. For instance, if you place your Sitemap in www.example.com/foo/, Google will not be able to accept www.example.com/ as a URL in your Sitemap"
Regards, Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
I think you have still misunderstood sitemaps, a sitemap is a file that will help Google index your site, it provides information as to how often Google should crawl the site and a rough guide as to the importance of sites.
This will only work if it is in the root of the site; if you keep the sitemap in an extensions folder then Google won't use it as all of the links it contains are below it.
Google will index your site but it simply wont use your sitemap which is the whole point of this discussion.
Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Brent Meshier Sent: 21 September 2005 16:52 To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Wrong about what? The extract you quote from Google simply states how the URLs in your sitemap must relate to the directory level of which the sitemap is stored. Google will index all pages on your site if its in the sitemap or not. Google's FAQ summarizes what I stated it my previous email:
http://www.google.com/webmasters/sitemaps/docs/en/faq.html#s3
3. Will Google crawl and index all of the URLs in my Sitemap? "We don't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
Perhaps you need to reread my previous mail.
Brent Meshier Director IT Operations Global Transport Logistics, Inc http://www.gtlogistics.com/ "Innovative Fulfillment Solutions"
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Arthur Guy Sent: Wednesday, September 21, 2005 10:38 AM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
I am afraid you are wrong
An extract from Google's site. "You must place the completed Sitemap in the highest directory that you would like Google to crawl. Generally, this is the root directory. Google can only accept URLs contained in your Sitemap that are the same directory level or lower. For instance, if you place your Sitemap in www.example.com/foo/, Google will not be able to accept www.example.com/ as a URL in your Sitemap"
Regards, Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
SOLUTION:
1) Copy the contents of http://wiki.case.edu/misc/googleSiteMap.phps to a php file in the base MediaWiki directory.
2) Confirm it works
3) Submit a link to this php script to the Google Sitemaps service
4) Wait
5) Don't get mad at me because your robots.txt file doesn't prohibit bots to spend days indexing your recent changes page and old revisions of articles (a simple 'Disallow: /path/to/MediaWiki/index.php' will do) (this assumes you have a rewrite where index.php is not in the page title)
As an aside, the importance of having a sitemap is immense if you have a robots.txt rule disallowing access to index.php (which is highly advised because bots get caught up in traversing old revisions). Without access to a sitemap, the Google bot will be unable to find articles that have no links that point to them because it won't be able to read Special:Allpages because different namespaces requests and page browsing relies on parameters passed to index.php. However, by disallowing access to index.php, you effectively isolate some pages from spiders, except of course for Google because of the sitemap. The upside is that your install won't get hammered by needless requests for old revisions. It is a trade off you will have to consider.
Gregory Szorc gregory.szorc@case.edu
Arthur Guy wrote:
I think you have still misunderstood sitemaps, a sitemap is a file that will help Google index your site, it provides information as to how often Google should crawl the site and a rough guide as to the importance of sites.
This will only work if it is in the root of the site; if you keep the sitemap in an extensions folder then Google won't use it as all of the links it contains are below it.
Google will index your site but it simply wont use your sitemap which is the whole point of this discussion.
Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
Audio & Video Leads Adapters and Accessories -----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Brent Meshier Sent: 21 September 2005 16:52 To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
Wrong about what? The extract you quote from Google simply states how the URLs in your sitemap must relate to the directory level of which the sitemap is stored. Google will index all pages on your site if its in the sitemap or not. Google's FAQ summarizes what I stated it my previous email:
http://www.google.com/webmasters/sitemaps/docs/en/faq.html#s3
- Will Google crawl and index all of the URLs in my Sitemap?
"We don't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
Perhaps you need to reread my previous mail.
Brent Meshier Director IT Operations Global Transport Logistics, Inc http://www.gtlogistics.com/ "Innovative Fulfillment Solutions"
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Arthur Guy Sent: Wednesday, September 21, 2005 10:38 AM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Re: Google Sitemaps
I am afraid you are wrong
An extract from Google's site. "You must place the completed Sitemap in the highest directory that you would like Google to crawl. Generally, this is the root directory. Google can only accept URLs contained in your Sitemap that are the same directory level or lower. For instance, if you place your Sitemap in www.example.com/foo/, Google will not be able to accept www.example.com/ as a URL in your Sitemap"
Regards, Arthur Guy
arthur@astarsolutions.co.uk a star solutions www.astarsolutions.co.uk
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
'a star solutions' disclaimer The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of the company. We believe that this communication is free from viruses and other potentially dangerous programmes, but the recipient opens this communication at their own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this communication
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org