[Mediawiki-l] Help with crawling Special:AllPages for small proprietary wiki

Christopher Desmarais (Contractor) christopher.desmarais at sjrb.ca
Fri Jun 20 22:42:59 UTC 2008


We have a small propreitary wiki, and we would like to be able to search
the entire wiki content daily with sharepoint.
 
It looks like the easiest way to do that would be to start a crawl at
special:allpages site. However, sharepoint immediately stops any such
crawl because the site has:
 
<meta name="robots" content="noindex,nofollow" />
 
We looked for but can't seem to find any configuration options that we
set to include those tags. There is no robots.txt file in the root
directory, and we haven't set anything in LocalSettings or
DefaultingSettings to prevent robots from following the page (eg.
Defaultsettings.php has  $wgNamespaceRobotPolicies = array(); and local
settings has no robot directives at all)
 
1) Is this a default setting for the special pages? 
2) If it isn't where can we look for things we might have set that we
can turn off?
3) If it is, is there anything we can turn on to stop that tag from
being put in the page?
 
If we can't prevent those tags from being inserted, has anyone managed
to use the special:export feature with sharepoint? Any articles that
might help us solve this problem?
 
In theory I could write a .net application to read the anchor tags out
of the page, then create an .aspx without the noindex, nofollow settings
to crawl the pages on special:allpages. But surely, there's an easier
way.
 
Thanks,
 
Chris


More information about the MediaWiki-l mailing list