Hi, In my Google Webmaster Tools account, there are a lot of crawl 404 errors for non-existent pages. It appears that it will request a page that does not exist (with &redlink=1), get a 200 status, of course, and the "create" page, and then somehow they derive a link to the same url but without the &redlink or &edit parameters (probably from the menu links on the page), which they then try to crawl and receive 404.
Does anyone know how to deal with this so that the google crawler do this? It looks like google first discovers the redlinks mostly from a previous spam page which was subsequently deleted. But, once google sees it the first time, then remember it for quite some time.
Thanks,Al
If you are using the Wikipedia URL scheme (/wiki/Page_title and /w/index.php?title=Page_title&foo=bar), you can just ban bots from /w/ in robots.txt.
On Fri, Mar 13, 2015 at 7:26 AM, Al alj62888@yahoo.com wrote:
Hi, In my Google Webmaster Tools account, there are a lot of crawl 404 errors for non-existent pages. It appears that it will request a page that does not exist (with &redlink=1), get a 200 status, of course, and the "create" page, and then somehow they derive a link to the same url but without the &redlink or &edit parameters (probably from the menu links on the page), which they then try to crawl and receive 404.
Does anyone know how to deal with this so that the google crawler do this? It looks like google first discovers the redlinks mostly from a previous spam page which was subsequently deleted. But, once google sees it the first time, then remember it for quite some time.
Thanks,Al
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org