A robots.txt could easily be set up to disallow /wiki/special%3ARecentChanges (and various case variations). That only stops _nice_ spiders, of course.
History links would need to be changed to be sufficiently distinguishable, for instance using /wiki.phtml?title=Foo&action=history etc; then ban /wiki.phtml.
I think we should do that ASAP. Let's close the whole special: namespace, &action=edit, &action=history, &diff=yes and &oldID stuff to spiders. None of this is of any value to the spiders anyway.
Axel
On 5/17/02 5:37 PM, "Axel Boldt" axel@uni-paderborn.de wrote:
A robots.txt could easily be set up to disallow /wiki/special%3ARecentChanges (and various case variations). That only stops _nice_ spiders, of course.
History links would need to be changed to be sufficiently distinguishable, for instance using /wiki.phtml?title=Foo&action=history etc; then ban /wiki.phtml.
I think we should do that ASAP. Let's close the whole special: namespace, &action=edit, &action=history, &diff=yes and &oldID stuff to spiders. None of this is of any value to the spiders anyway.
I think we should not do that any time soon. For one, this is a wikipedia-l level discussion. Until there is direct evidence that spiders are causing any serious problem for Wikipedia (and noone has presented any) we shouldn't even be discussing this.
Just because you can't see why it would be of value doesn't mean that it isn't.
Again, if we surmise that spiders are causing slowdowns, we should be able to find evidence for that BEFORE we block parts of the site from them. And even then we should see if the fault lies in the site's code.
Spiders simulate high traffic well, and that's something that wikipedia should be able to handle.
tc
The Cunctator wrote:
Again, if we surmise that spiders are causing slowdowns, we should be able to find evidence for that BEFORE we block parts of the site from them. And even then we should see if the fault lies in the site's code.
I think this is right, although blocking them from 'edit' doesn't seem harmful. Certainly, it's good for spiders to hit 'Recent Changes', and often.
Spiders simulate high traffic well, and that's something that wikipedia should be able to handle.
Right.
I'll do some research to determine if spiders are causing any problems, but in my experienced judgment based on running high traffic sites, I think it is pretty unlikely.
--Jimbo
On 5/18/02 3:08 PM, "Jimmy Wales" jwales@bomis.com wrote:
The Cunctator wrote:
Again, if we surmise that spiders are causing slowdowns, we should be able to find evidence for that BEFORE we block parts of the site from them. And even then we should see if the fault lies in the site's code.
I think this is right, although blocking them from 'edit' doesn't seem harmful. Certainly, it's good for spiders to hit 'Recent Changes', and often.
Both points make perfect sense to me.
wikitech-l@lists.wikimedia.org