[Mediawiki-l] Problem and fix for generateSitemap.php maintenance script

The Gadget Doctor mediawiki at thegadgetdoctor.com
Thu Aug 14 14:57:35 UTC 2008


This isn't all my work, and I'm not really a developer but perhaps the
following can be fixed in the source code?

It's been bugging me that the sitemap index file generated by the
generateSitemap.php maintenance script didn't have absolute paths for
the sub-sitemaps making it useless. If I just submitted the main
sitemap for the main namespace google complained that all my pages
were the same priority.

I found a partial fix on the web:
http://lists.alioth.debian.org/pipermail/pkg-mediawiki-devel/2008-January/001194.html

But that doesn't go all the way.

I'm not sure I'm going to do this very often so perhaps it isn't
necessary for me to sacrifice a goat (or what ever other initiation is
needed to be a real MW dev) but here is my diff versus 1.13RC2
version. Perhaps someone can include this.

I've had to cludge one thing.... I ask the user to add both the
absolute path that the sitemap files will be saved to, and the path
that becomes relative to the web server. I couldn't think of a
reliable way to do this but my php skills are very limited. I run this
in a nightly cron job now.

svn diff generateSitemap.php
Index: generateSitemap.php
===================================================================
--- generateSitemap.php (revision 38559)
+++ generateSitemap.php (working copy)
@@ -1,4 +1,4 @@
-<?php
+x<?php
 define( 'GS_MAIN', -2 );
 define( 'GS_TALK', -1 );
 /**
@@ -367,9 +367,11 @@
         * @return string
         */
        function indexEntry( $filename ) {
+         global $wgServer;
+         global $wgWebpath;
                return
                        "\t<sitemap>\n" .
-                       "\t\t<loc>$filename</loc>\n" .
+                       "\t\t<loc>$wgServer$wgWebpath/$filename</loc>\n" .
                        "\t\t<lastmod>{$this->timestamp}</lastmod>\n" .
                        "\t</sitemap>\n";
        }
@@ -457,18 +459,30 @@
                server name detection may fail in command line scripts.

        --compress=[yes|no]     compress the sitemap files, default yes
+
+       --webpath=<dir>         If you are placing the sitemap files
in a sub folder
+               i.e. using the --fspath option and specify somewhere
other than root
+               you need to place here the directory name e.g:
+
+                 if -fspath = /var/www/httpdocs/mediawiki_sitemaps/ for example
+                 then --webpath = /mediawiki_sitemaps   *Note, no
trailing / needed
+

 EOT;
        die( -1 );
 }

-$optionsWithArgs = array( 'fspath', 'server', 'compress' );
+$optionsWithArgs = array( 'fspath', 'server', 'compress', 'webpath' );
 require_once( dirname( __FILE__ ) . '/commandLine.inc' );

 if ( isset( $options['server'] ) ) {
        $wgServer = $options['server'];
 }

+if ( isset( $options['webpath'] ) ) {
+       $wgWebpath = $options['webpath'];
+}
+
 $gs = new GenerateSitemap( @$options['fspath'], @$options['compress']
!== 'no' );
 $gs->main();



More information about the MediaWiki-l mailing list