-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
So first thanks a lot for your time and mails!
it does re-loading attempts AND it applies correct unicode encoding to the html page contents. Both is not done by urlopen as far as I know...(?)
Right. Then abstract that stuff out of the Site urlopen into a seperate module, and use that.
The issue is not the bot logging out, the issue is you're using a function for something it was never supposed to do. When you do that, you shouldn't be surprised things break.
This cannot be the way to go. You are right I am "abusing" the function. Since in the docu/description there is written
"Low-level routine to get a URL from the wiki."
and I am using it for arbitrary (non-wiki) URLs. But there is also nothing mentioned about any login try to the current (or any other wiki) at all... The pywikipedia team did a really good job in writing this function and it seems strange to me to copy the whole function as it is, just dropping 1 or 2 lines of code to achieve what I need (and then I would also have to maintain that code in parallel). As far as I can see at the moment, the problem is the call to 'self._getUserDataOld' at the end. I am not an expert in this, but I tried to investigate it as good as possible. That is also the reason why I asked following (may be stupid) questions:
So I do not understand how the initial login (by cookies) is done and at what place in the code? Then I do not understand why the later (re)login is done in a different way? And last I do no under- stand why 'LoginManager' ask for a password but does not need it, if there are cookies present? (this requested user input seems to break my bot then...)
I was able to answer the first question: 'site._load' is resposible for the very first login AND is also able to re-login for me. 'getUrl' is NOT able to re-login EVEN when accessing a page from dewiki... AND THIS SHOULD WORK as far as I can see (so we have a bug here). The other two questions I was not able to answer myself...
At the moment to me it looks like adding a keyword argument to 'getUrl' called 'noLogin' similar to 'getSite' preventing 'getUrl' from calling '_getUserDataOld' at the end should solve my problem. And this should not be in any contradiction to 'getUrl' as it is described.
Greetings and have a nice day DrTrigon