[Mediawiki-l] Help with crawling Special:AllPages for small proprietary wiki

Tim Starling tstarling at wikimedia.org
Fri Jun 20 22:59:22 UTC 2008


Christopher Desmarais (Contractor) wrote:
> We have a small propreitary wiki, and we would like to be able to search
> the entire wiki content daily with sharepoint.
>  
> It looks like the easiest way to do that would be to start a crawl at
> special:allpages site. However, sharepoint immediately stops any such
> crawl because the site has:
>  
> <meta name="robots" content="noindex,nofollow" />
>  
> We looked for but can't seem to find any configuration options that we
> set to include those tags. There is no robots.txt file in the root
> directory, and we haven't set anything in LocalSettings or
> DefaultingSettings to prevent robots from following the page (eg.
> Defaultsettings.php has  $wgNamespaceRobotPolicies = array(); and local
> settings has no robot directives at all)
>  
> 1) Is this a default setting for the special pages? 

It's hard-coded for all special pages.

> 2) If it isn't where can we look for things we might have set that we
> can turn off?
> 3) If it is, is there anything we can turn on to stop that tag from
> being put in the page?

Index: includes/specials/Allpages.php
===================================================================
--- includes/specials/Allpages.php	(revision 36353)
+++ includes/specials/Allpages.php	(working copy)
@@ -12,6 +12,8 @@
  function wfSpecialAllpages( $par=NULL, $specialPage ) {
  	global $wgRequest, $wgOut, $wgContLang;

+	$wgOut->setRobotPolicy( '' );
+
  	# GET values
  	$from = $wgRequest->getVal( 'from' );
  	$namespace = $wgRequest->getInt( 'namespace' );




More information about the MediaWiki-l mailing list