This isn't all my work, and I'm not really a developer but perhaps the following can be fixed in the source code?
It's been bugging me that the sitemap index file generated by the generateSitemap.php maintenance script didn't have absolute paths for the sub-sitemaps making it useless. If I just submitted the main sitemap for the main namespace google complained that all my pages were the same priority.
I found a partial fix on the web: http://lists.alioth.debian.org/pipermail/pkg-mediawiki-devel/2008-January/00...
But that doesn't go all the way.
I'm not sure I'm going to do this very often so perhaps it isn't necessary for me to sacrifice a goat (or what ever other initiation is needed to be a real MW dev) but here is my diff versus 1.13RC2 version. Perhaps someone can include this.
I've had to cludge one thing.... I ask the user to add both the absolute path that the sitemap files will be saved to, and the path that becomes relative to the web server. I couldn't think of a reliable way to do this but my php skills are very limited. I run this in a nightly cron job now.
svn diff generateSitemap.php Index: generateSitemap.php =================================================================== --- generateSitemap.php (revision 38559) +++ generateSitemap.php (working copy) @@ -1,4 +1,4 @@ -<?php +x<?php define( 'GS_MAIN', -2 ); define( 'GS_TALK', -1 ); /** @@ -367,9 +367,11 @@ * @return string */ function indexEntry( $filename ) { + global $wgServer; + global $wgWebpath; return "\t<sitemap>\n" . - "\t\t<loc>$filename</loc>\n" . + "\t\t<loc>$wgServer$wgWebpath/$filename</loc>\n" . "\t\t<lastmod>{$this->timestamp}</lastmod>\n" . "\t</sitemap>\n"; } @@ -457,18 +459,30 @@ server name detection may fail in command line scripts.
--compress=[yes|no] compress the sitemap files, default yes + + --webpath=<dir> If you are placing the sitemap files in a sub folder + i.e. using the --fspath option and specify somewhere other than root + you need to place here the directory name e.g: + + if -fspath = /var/www/httpdocs/mediawiki_sitemaps/ for example + then --webpath = /mediawiki_sitemaps *Note, no trailing / needed +
EOT; die( -1 ); }
-$optionsWithArgs = array( 'fspath', 'server', 'compress' ); +$optionsWithArgs = array( 'fspath', 'server', 'compress', 'webpath' ); require_once( dirname( __FILE__ ) . '/commandLine.inc' );
if ( isset( $options['server'] ) ) { $wgServer = $options['server']; }
+if ( isset( $options['webpath'] ) ) { + $wgWebpath = $options['webpath']; +} + $gs = new GenerateSitemap( @$options['fspath'], @$options['compress'] !== 'no' ); $gs->main();