Do wikipedia.py and urllib2 obey robots.txt? I didn't find anything in the former.
No, it is not, and in my opinion it would also not be desirable to do so. Among the forbids in the robots.txt is the forbid to get edit pages; without that it would be impossible to run the bot. Under normal circumstances the bot will keep to the specification in robots.txt that there has to be at least 1 second between two requests.
Andre Engels
On 7/11/05, Scot Wilcoxon scot@wilcoxon.org wrote:
Do wikipedia.py and urllib2 obey robots.txt? I didn't find anything in the former.
No, it is not, and in my opinion it would also not be desirable to do so. Among the forbids in the robots.txt is the forbid to get edit pages; without that it would be impossible to run the bot.
And I have now found robotparser, which will handle non-Wiki needs, but it doesn't recognize situations for exceptions such as this. For bot edit situations the restrictions are not needed because a username/password (or site configuration) allows an implied override of bot restrictions.
I had more concern about standardize_notes.py fetching citation descriptions from other sites, but I'll add robotparser to it.
wikibots-l@lists.wikimedia.org