Logout when using 'getUrl' with foreign wiki

List overview All Threads
Download

newer

older

bot page count

SVN access for Hannes Röst (was:...

Dr. Trigon

27 Jan 2012 27 Jan '12

10:31 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hello all!

I have a quite confusing situation happening to my bot when trying to access any URL that points to foreign (but mediawiki software) wiki, like this:

...

...
...
pywikibot.getSite().getUrl(foreign_wiki_url, no_hostname = True)

This should be more or less similar to a simple:

...

...
...
urllib.urlopen(foreign_wiki_url).read()

I mean I am not trying to edit or access that other wiki in a usual way (I have not login there, no family file, ...) I just want to fetch the HTML text from that page - which actually works. But while doing this the bot seems to log-out since afterwards it is not able to edit any page anymore the "traceback" from the logs is:

* Password for user DrTrigonBot on wikipedia:de: * /opt/ts/python/2.7/lib/python2.7/getpass.py:83: GetPassWarning: Can not control echo on the terminal. * passwd = fallback_getpass(prompt, stream) * Warning: Password input may be echoed. * * Unhandled exception in thread started by * * Traceback (most recent call last): * File "/home/drtrigon/pywikipedia/bot_control.py", line 317, in write * string = self._REGEX_boc.sub('', string) # make more readable * File "/home/drtrigon/pywikipedia/subster_irc.py", line 156, in main_subster * bot.run() * File "/home/drtrigon/pywikipedia/subster.py", line 220, in run * self.save(page, substed_content, head + mod % (", ".join(substed_tags)), **flags) * File "/home/drtrigon/pywikipedia/dtbext/dtbext_basic.py", line 240, in save * page.put(text, comment=comment, minorEdit=minorEdit, botflag=botflag) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 1708, in put * sysop = self._getActionUser(action = 'edit', restriction = self.editRestriction, sysop = sysop) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 1581, in _getActionUser * self.site().forceLogin(sysop = sysop) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 5008, in forceLogin * if loginMan.login(retry = True): * File "/home/drtrigon/pywikipedia/login.py", line 307, in login * password = True) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 8018, in input * data = ui.input(question, password) * File "/home/drtrigon/pywikipedia/userinterfaces/terminal_interface.py", line 238, in input * text = getpass.getpass('') * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 83, in unix_getpass * passwd = fallback_getpass(prompt, stream) * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 118, in fallback_getpass * return _raw_input(prompt, stream) * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 135, in _raw_input * raise EOFError * EOFError

...any idea whats happening here??

Thanks a lot! Greetings DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8i0WgACgkQAXWvBxzBrDAtpwCfdge6G2fKmh+hYZIX2YQOi7lY wyIAoKN7j6QeQymclaCOqhSWdHhG1XUy =bLby -----END PGP SIGNATURE-----

Show replies by date

Dr. Trigon

27 Jan 27 Jan

11:06 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...forgot to add when using an API URL link for same wikis the issue DOES NOT appear.

On 27.01.2012 17:31, Dr. Trigon wrote:

...

Hello all!

I have a quite confusing situation happening to my bot when trying to access any URL that points to foreign (but mediawiki software) wiki, like this:

...
...
...
pywikibot.getSite().getUrl(foreign_wiki_url, no_hostname = True)

This should be more or less similar to a simple:

...
...
...
urllib.urlopen(foreign_wiki_url).read()

I mean I am not trying to edit or access that other wiki in a usual way (I have not login there, no family file, ...) I just want to fetch the HTML text from that page - which actually works. But while doing this the bot seems to log-out since afterwards it is not able to edit any page anymore the "traceback" from the logs is:

Password for user DrTrigonBot on wikipedia:de: *

/opt/ts/python/2.7/lib/python2.7/getpass.py:83: GetPassWarning: Can not control echo on the terminal. * passwd = fallback_getpass(prompt, stream) * Warning: Password input may be echoed. * * Unhandled exception in thread started by * * Traceback (most recent call last): * File "/home/drtrigon/pywikipedia/bot_control.py", line 317, in write * string = self._REGEX_boc.sub('', string) # make more readable * File "/home/drtrigon/pywikipedia/subster_irc.py", line 156, in main_subster * bot.run() * File "/home/drtrigon/pywikipedia/subster.py", line 220, in run * self.save(page, substed_content, head + mod % (", ".join(substed_tags)), **flags) * File "/home/drtrigon/pywikipedia/dtbext/dtbext_basic.py", line 240, in save * page.put(text, comment=comment, minorEdit=minorEdit, botflag=botflag) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 1708, in put * sysop = self._getActionUser(action = 'edit', restriction = self.editRestriction, sysop = sysop) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 1581, in _getActionUser * self.site().forceLogin(sysop = sysop) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 5008, in forceLogin * if loginMan.login(retry = True): * File "/home/drtrigon/pywikipedia/login.py", line 307, in login * password = True) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 8018, in input * data = ui.input(question, password) * File "/home/drtrigon/pywikipedia/userinterfaces/terminal_interface.py", line 238, in input * text = getpass.getpass('') * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 83, in unix_getpass * passwd = fallback_getpass(prompt, stream) * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 118, in fallback_getpass * return _raw_input(prompt, stream) * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 135, in _raw_input * raise EOFError * EOFError

...any idea whats happening here??

Thanks a lot! Greetings DrTrigon

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8i2YYACgkQAXWvBxzBrDCp1wCfV7JVM2SzqQbs9lAbIEg4O7o6 VZoAn14fN0JR0bBJil16L7k5Y4fsNvyH =TKGa -----END PGP SIGNATURE-----

Dr. Trigon

4:15 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I tried several things here in order to investigate the origin of this issue. E.g.:

'Site.forceLogin()' does not help - essentially the same happens.

Then I recognized that there is the login, there are cookies and last but not least there is the 'LoginManager'. The strange thing now is; when starting the bot it get logged in into dewiki without any message or else (just works fine), then when accessing another wiki it obviousely tries to login there and forgets the dewiki login. But finally when coming back to dewiki the second time the 'LoginManager' gets used (this is not the case for the initial login) and asks for a password, but I can just press <enter> there (skip it) and the 'LoginManager' uses the cookies then and logs in even though no valid password was given...

So I do not understand how the initial login (by cookies) is done and at what place in the code? Then I do not understand why the later (re)login is done in a different way? And last I do no under- stand why 'LoginManager' ask for a password but does not need it, if there are cookies present? (this requested user input seems to break my bot then...)

Any help is greatly appreciated... Thanks in advance!

Gretings

On 27.01.2012 18:06, Dr. Trigon wrote:

...

...forgot to add when using an API URL link for same wikis the issue DOES NOT appear.

On 27.01.2012 17:31, Dr. Trigon wrote:

...
Hello all!

...
I have a quite confusing situation happening to my bot when trying to access any URL that points to foreign (but mediawiki software) wiki, like this:

...
...
...
...
pywikibot.getSite().getUrl(foreign_wiki_url, no_hostname = True)

...
This should be more or less similar to a simple:

...
...
...
...
urllib.urlopen(foreign_wiki_url).read()

...
I mean I am not trying to edit or access that other wiki in a usual way (I have not login there, no family file, ...) I just want to fetch the HTML text from that page - which actually works. But while doing this the bot seems to log-out since afterwards it is not able to edit any page anymore the "traceback" from the logs is:

...

Password for user DrTrigonBot on wikipedia:de: *

/opt/ts/python/2.7/lib/python2.7/getpass.py:83: GetPassWarning: Can not control echo on the terminal. * passwd = fallback_getpass(prompt, stream) * Warning: Password input may be echoed. * * Unhandled exception in thread started by * * Traceback (most recent call last): * File "/home/drtrigon/pywikipedia/bot_control.py", line 317, in write

string = self._REGEX_boc.sub('', string) # make more readable

File "/home/drtrigon/pywikipedia/subster_irc.py", line 156, in

main_subster * bot.run() * File "/home/drtrigon/pywikipedia/subster.py", line 220, in run * self.save(page, substed_content, head + mod % (", ".join(substed_tags)), **flags) * File "/home/drtrigon/pywikipedia/dtbext/dtbext_basic.py", line 240, in save * page.put(text, comment=comment, minorEdit=minorEdit, botflag=botflag) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 1708, in put * sysop = self._getActionUser(action = 'edit', restriction = self.editRestriction, sysop = sysop) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 1581, in _getActionUser * self.site().forceLogin(sysop = sysop) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 5008, in forceLogin * if loginMan.login(retry = True): * File "/home/drtrigon/pywikipedia/login.py", line 307, in login * password = True) * File "/home/drtrigon/pywikipedia/wikipedia.py", line 8018, in input * data = ui.input(question, password) * File "/home/drtrigon/pywikipedia/userinterfaces/terminal_interface.py", line 238, in input * text = getpass.getpass('') * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 83, in unix_getpass * passwd = fallback_getpass(prompt, stream) * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 118, in fallback_getpass * return _raw_input(prompt, stream) * File "/opt/ts/python/2.7/lib/python2.7/getpass.py", line 135, in _raw_input * raise EOFError * EOFError

...
...any idea whats happening here??

...
Thanks a lot! Greetings DrTrigon

...
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8jIfwACgkQAXWvBxzBrDDc2gCgrCQzIKrqqho9UuxDZmIPxRxu 49gAnAlzU+AmhQOF+ex6xj6Js/ojEpxk =Pwrz -----END PGP SIGNATURE-----

Merlijn van Deen

5:17 p.m.

On 27 January 2012 17:31, Dr. Trigon dr.trigon@surfeu.ch wrote:

...

I have a quite confusing situation happening to my bot when trying to access any URL that points to foreign (but mediawiki software) wiki, like this:

...
...
...
pywikibot.getSite().getUrl(foreign_wiki_url, no_hostname = True)

But while doing this the bot seems to log-out since afterwards it is not able to edit any page anymore the "traceback" from the logs is:

Random guess: the bot sends the old site's cookies to the foreign wiki, gets new cookies back and writes those to the user-data file. Then in the next request it tries to use those cookies, which fails.

Check your cookie data file in user-data to confirm.

In any case: why are you trying to use a function that is clearly not made for this purpose, instead of using, say, urlopen, directly, or creating a family file?

Merlijn

Dr. Trigon

28 Jan 28 Jan

6:32 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

First; thanks a lot for your reply!!

...

Random guess: the bot sends the old site's cookies to the foreign wiki, gets new cookies back and writes those to the user-data file. Then in the next request it tries to use those cookies, which fails.

Check your cookie data file in user-data to confirm.

What do you mean by 'user-data'? I looked at 'login-data', since there are some files stored... I am not sure if they are changed but as mentioned in the last mail the bot is still able to login under some circumstances. Also 'python login.py -test' claims to be logged in... But what would be the best to do in your optinion? Wipe out all those files and re-login once and then store a copy of the files to compare?

...

In any case: why are you trying to use a function that is clearly not made for this purpose, instead of using, say, urlopen, directly, or creating a family file?

You are right, that is true. But the function works very well except under the rare occasions mentioned here. The main reason why I use this function is; it does re-loading attempts AND it applies correct unicode encoding to the html page contents. Both is not done by urlopen as far as I know...(?) Also creating a family file is not what I want (sorry ;) since I would like to handle this url like any arbitrary url from the web and not as a wiki. As far as I can see the point where things are going wrong is at the very end of 'getUrl':

* # If a wiki page, get user data * self._getUserDataOld(text, sysop = sysop)

everything else seems to be fine.

Greetings DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8j6uMACgkQAXWvBxzBrDBf8gCfVgMp1NrWfZOf5nttuBPLsRDh CSkAoKKnlfil5tagJ+kR1Gq/uzjt0uLj =V1Rs -----END PGP SIGNATURE-----

Merlijn van Deen

4:16 p.m.

...

What do you mean by 'user-data'? I looked at 'login-data'

That one, yes. It contains cookies.

...

(...) the bot is still able to login under some circumstances.

Then find out /which/ circumstances these are. I can imagine nothing bad happens as long as the secondary server does not send mediawiki cookies (xxwiki_userid etc), so that these are not overwritten.

...

Also 'python login.py -test' claims to be logged in...

I am not sure what this tests, but I can imagine this only tests if the cookie file exists.

...

But what would be the best to do in your optinion? Wipe out all those files and re-login once and then store a copy of the files to compare?

Compare the specific one, of course. Log in to xxwiki, copy wikipedia-xx-username.data, do your getUrl stuff and compare the files.

Or do what I just did: read the getUrl function and conclude there is indeed cookie-updating stuff there.

...

it does re-loading attempts AND it applies correct unicode encoding to the html page contents. Both is not done by urlopen as far as I know...(?)

Right. Then abstract that stuff out of the Site urlopen into a seperate module, and use that.

Best, Merlijn

Dr. Trigon

4:48 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...did you read ALL mail I wrote...? :)

Greetings

On 28.01.2012 23:16, Merlijn van Deen wrote:

...

What do you mean by 'user-data'? I looked at 'login-data'

That one, yes. It contains cookies.

(...) the bot is still able to login under some circumstances.

Then find out /which/ circumstances these are. I can imagine nothing bad happens as long as the secondary server does not send mediawiki cookies (xxwiki_userid etc), so that these are not overwritten.

Also 'python login.py -test' claims to be logged in...

I am not sure what this tests, but I can imagine this only tests if the cookie file exists.

But what would be the best to do in your optinion? Wipe out all those files and re-login once and then store a copy of the files to compare?

Compare the specific one, of course. Log in to xxwiki, copy wikipedia-xx-username.data, do your getUrl stuff and compare the files.

Or do what I just did: read the getUrl function and conclude there is indeed cookie-updating stuff there.

it does re-loading attempts AND it applies correct unicode encoding to the html page contents. Both is not done by urlopen as far as I know...(?)

Right. Then abstract that stuff out of the Site urlopen into a seperate module, and use that.

Best, Merlijn

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ke0EACgkQAXWvBxzBrDCuWQCgvAsQ62GyZaCdUfbldpxTZ5Q4 7JIAn02FPXGkRa+HvizdYGiuNTm8CNNj =dmDg -----END PGP SIGNATURE-----

Merlijn van Deen

29 Jan 29 Jan

11:38 a.m.

On 28 January 2012 23:48, Dr. Trigon dr.trigon@surfeu.ch wrote:

...

...did you read ALL mail I wrote...? :)

No. Now I have. My response wouldn't have been different.

The issue is not the bot logging out, the issue is you're using a function for something it was never supposed to do. When you do that, you shouldn't be surprised things break.

For details on /what/ breaks and what you could do to improve your situation, see my previous mail.

Best, Merlijn

Dr. Trigon

2:33 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

So first thanks a lot for your time and mails!

...

it does re-loading attempts AND it applies correct unicode encoding to the html page contents. Both is not done by urlopen as far as I know...(?)

Right. Then abstract that stuff out of the Site urlopen into a seperate module, and use that.

The issue is not the bot logging out, the issue is you're using a function for something it was never supposed to do. When you do that, you shouldn't be surprised things break.

This cannot be the way to go. You are right I am "abusing" the function. Since in the docu/description there is written

"Low-level routine to get a URL from the wiki."

and I am using it for arbitrary (non-wiki) URLs. But there is also nothing mentioned about any login try to the current (or any other wiki) at all... The pywikipedia team did a really good job in writing this function and it seems strange to me to copy the whole function as it is, just dropping 1 or 2 lines of code to achieve what I need (and then I would also have to maintain that code in parallel). As far as I can see at the moment, the problem is the call to 'self._getUserDataOld' at the end. I am not an expert in this, but I tried to investigate it as good as possible. That is also the reason why I asked following (may be stupid) questions:

...

So I do not understand how the initial login (by cookies) is done and at what place in the code? Then I do not understand why the later (re)login is done in a different way? And last I do no under- stand why 'LoginManager' ask for a password but does not need it, if there are cookies present? (this requested user input seems to break my bot then...)

I was able to answer the first question: 'site._load' is resposible for the very first login AND is also able to re-login for me. 'getUrl' is NOT able to re-login EVEN when accessing a page from dewiki... AND THIS SHOULD WORK as far as I can see (so we have a bug here). The other two questions I was not able to answer myself...

At the moment to me it looks like adding a keyword argument to 'getUrl' called 'noLogin' similar to 'getSite' preventing 'getUrl' from calling '_getUserDataOld' at the end should solve my problem. And this should not be in any contradiction to 'getUrl' as it is described.

Greetings and have a nice day DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8lrQcACgkQAXWvBxzBrDCZlQCfaJkYuTjkDkcmDok2cGslmNH2 W5QAoJini5QSuytsMWmzmahAswnbJPkR =sQ+7 -----END PGP SIGNATURE-----

info＠gno.de

2:09 p.m.

Sorry I havn't read the hole thread...

...

as I know...(?) Also creating a family file is not what I want (sorry ;) since I would like to handle this url like any arbitrary url from the web and not as a wiki. As far as I can see the point where things are

...but the idea is you could read any web page via wikipedia.site.getUrl(path, no_hostname = True)

no_hostname must be assigned to True for doing this.

Regards xqt

Merlijn van Deen

2:29 p.m.

On 29 January 2012 21:09, info@gno.de wrote:

...

...but the idea is you could read any web page via wikipedia.site.getUrl(path, no_hostname = True)

no_hostname must be assigned to True for doing this.

Hm, it seems I have misread the code. Indeed, cookies are not written of no_hostname=True.

However, my point remains: this is not something Site.getUrl /should/ do, as it is not related to the Site object /at all/. As far as I am concerned, this parameter should be *removed* rather than *encouraged* - general http functions should be in a *general* module, /not/ in the site object.

Of course, this refactoring has already been done correctly in the rewrite...

Best, Merlijn

Dr. Trigon

2:46 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

what about splitting it into a general 'getUrl' for any kind of URL and then specialize it into the 'getUrl' of the site object??? That would satisfy both of our wishes...?! ;))) (a good compromise - what do you think?)

Greetings

On 29.01.2012 21:29, Merlijn van Deen wrote:

...

On 29 January 2012 21:09, <info@gno.de mailto:info@gno.de> wrote:

...but the idea is you could read any web page via wikipedia.site.getUrl(path, no_hostname = True)

no_hostname must be assigned to True for doing this.

Hm, it seems I have misread the code. Indeed, cookies are not written of no_hostname=True.

However, my point remains: this is not something Site.getUrl /should/ do, as it is not related to the Site object /at all/. As far as I am concerned, this parameter should be *removed* rather than *encouraged* - general http functions should be in a *general* module, /not/ in the site object.

Of course, this refactoring has already been done correctly in the rewrite...

Best, Merlijn

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8lsBkACgkQAXWvBxzBrDDBggCfe2ZR7XZcqT/Cwkp4stlKS67H IocAmwSY2mIZlxBbhp4aUEnmuZ5CgKW9 =rRg+ -----END PGP SIGNATURE-----

Dr. Trigon

31 Jan 31 Jan

5:34 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hello again!

Sorry for bothering but this has to be solved at some point... ;))

...

Of course, this refactoring has already been done correctly in the rewrite...

May be I should have mentioned first that I am talking about 'trunk' and not 'rewrite'... ;)

While reading the 'rewrite' realisation of 'getUrl' and also 'pywikibot.comms.http.request' I came along the comment "Queue a request to be submitted to Site." IN 'pywikibot.comms.http.request'... So to me it seems this is the site's 'getUrl' method instead of a generic one as it should be... ;) and as it needs info from the site object it will be hard to separate the two.

But in fact I just want to conider the 'trunk' part. I would like to propose to split the site's 'getUrl' into a generic one (e.g. in 'pywikibot.comms.http.request' analogue to 'rewrite') and the usual one in site object (which then uses the genric one). That way we have both; the genric one (I desperately need) and the site's one, that - as Merlijn mentioned

...

However, my point remains: this is not something Site.getUrl /should/ do, as it is not related to the Site object /at all/. As far as I am concerned, this parameter should be *removed* rather than *encouraged* - general http functions should be in a *general* module, /not/ in the site object.

In that way both our needs would be satisfied and thus this could be a possible way to go? Any other opinions?

I would also offer to do the adoptions and write the needed code parts - even though it might take some time... ;)

Greetings DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8n0cEACgkQAXWvBxzBrDBN2ACgjQMsGzFP8emQBZgpxknbvPr2 GyYAn0kOmxAZvg3dHHK7i2NWMmY3tJfx =oIu9 -----END PGP SIGNATURE-----

Dr. Trigon

1 Feb 1 Feb

10:22 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...so no answer at all - may I thus assume that everybody agrees with this proposal?

I will start to work on this as soon as possible. (I am planning to add some of DrTrigonBot framework code to pywikipedia too then, like the clean_user_sandbox script and the 'getSections' method... but this is a different story ;)

Greetings Dr. Trigon

On 31.01.2012 12:34, Dr. Trigon wrote:

...

Hello again!

Sorry for bothering but this has to be solved at some point... ;))

...
Of course, this refactoring has already been done correctly in the rewrite...

May be I should have mentioned first that I am talking about 'trunk' and not 'rewrite'... ;)

While reading the 'rewrite' realisation of 'getUrl' and also 'pywikibot.comms.http.request' I came along the comment "Queue a request to be submitted to Site." IN 'pywikibot.comms.http.request'... So to me it seems this is the site's 'getUrl' method instead of a generic one as it should be... ;) and as it needs info from the site object it will be hard to separate the two.

But in fact I just want to conider the 'trunk' part. I would like to propose to split the site's 'getUrl' into a generic one (e.g. in 'pywikibot.comms.http.request' analogue to 'rewrite') and the usual one in site object (which then uses the genric one). That way we have both; the genric one (I desperately need) and the site's one, that - as Merlijn mentioned

...
However, my point remains: this is not something Site.getUrl /should/ do, as it is not related to the Site object /at all/. As far as I am concerned, this parameter should be *removed* rather than *encouraged* - general http functions should be in a *general* module, /not/ in the site object.

In that way both our needs would be satisfied and thus this could be a possible way to go? Any other opinions?

I would also offer to do the adoptions and write the needed code parts - even though it might take some time... ;)

Greetings DrTrigon

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8pZqcACgkQAXWvBxzBrDDcFwCg5Hxqm7RXPI15bKn752VYhlgw BqkAmwb7zHTjkHDQ5cbnd60xKd65JhVn =NyBl -----END PGP SIGNATURE-----

Merlijn van Deen

2 Feb 2 Feb

4:14 p.m.

On 31 January 2012 12:34, Dr. Trigon dr.trigon@surfeu.ch wrote:

...

While reading the 'rewrite' realisation of 'getUrl' and also 'pywikibot.comms.http.request' I came along the comment "Queue a request to be submitted to Site." IN 'pywikibot.comms.http.request'... So to me it seems this is the site's 'getUrl' method instead of a generic one as it should be... ;) and as it needs info from the site object it will be hard to separate the two.

Heh, you're right. However, that .request method is probably easier to split (as it's smaller and simpler): lines 100-119 should just be split off into a seperate function.

...

But in fact I just want to conider the 'trunk' part. I would like to propose to split the site's 'getUrl' into a generic one (e.g. in 'pywikibot.comms.http.request' analogue to 'rewrite') and the usual one in site object (which then uses the genric one). That way we have both; the genric one (I desperately need) and the site's one, that - as Merlijn mentioned

That would be great. I'd suggest to use the structure - and if possible the code - from the rewrite, making the API for both branches as similar as possible.

Best, Merlijn

Dr. Trigon

18 Feb 18 Feb

5:32 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 02.02.2012 23:14, Merlijn van Deen wrote:

...

Heh, you're right. However, that .request method is probably easier to split (as it's smaller and simpler): lines 100-119 should just be split off into a seperate function.

http://svn.wikimedia.org/viewvc/pywikipedia/branches/rewrite/pywikibot/comms...

Looks very nice and well coded thus (as you mentioned) I used this structure.

...

That would be great. I'd suggest to use the structure - and if possible the code - from the rewrite, making the API for both branches as similar as possible.

http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pywikibot/comm...

I tried to make them as similar as possible in r9901 [1] but the differ fundamentally which makes full adoption somehow hard. I would sugest to use the current one, to check if the code move worked (and did not introduce errors) and afterwards (some time later) adopt it further to be more "rewrite-ish"... (may be that step should be done by someone familar with trunk AND rewrite - as I am using trunk only... ;)

[1] http://www.mediawiki.org/wiki/Special:Code/pywikipedia/9901

Hope that suits you fine?

Greetings DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8/jGcACgkQAXWvBxzBrDDEDwCg4vVyN2FPWqI0QhONpy+qDLcK eAAAoIqkd0bDrV5tJ/yp/QhHcDZzudFL =uYAY -----END PGP SIGNATURE-----

Merlijn van Deen

26 Feb 26 Feb

8:31 a.m.

On 18 February 2012 12:32, Dr. Trigon dr.trigon@surfeu.ch wrote:

...

I tried to make them as similar as possible in r9901 [1] but the differ fundamentally which makes full adoption somehow hard. I would sugest to use the current one, to check if the code move worked (and did not introduce errors) and afterwards (some time later) adopt it further to be more "rewrite-ish"... (may be that step should be done by someone familar with trunk AND rewrite - as I am using trunk only... ;)

[1] http://www.mediawiki.org/wiki/Special:Code/pywikipedia/9901

Hope that suits you fine?

Sounds like a good plan. Thanks for your efforts!

Best, Merlijn

Dr. Trigon

29 Jan 29 Jan

2:44 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 29.01.2012 21:09, info@gno.de wrote:

...

Sorry I havn't read the hole thread...

...
as I know...(?) Also creating a family file is not what I want (sorry ;) since I would like to handle this url like any arbitrary url from the web and not as a wiki. As far as I can see the point where things are

...but the idea is you could read any web page via wikipedia.site.getUrl(path, no_hostname = True)

This is exactly what I am doing... and then 'getUrl' logs me out. But when accessing a page from original wiki (de) again by 'getUrl' it is not able to (re-)login... :(

...

no_hostname must be assigned to True for doing this.

Regards xqt

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8lr5EACgkQAXWvBxzBrDBn4wCfTiSm5mL/xKo8Hd4qyDeEs9cE 8VYAn1nnXRhyry5/UZZGZ4xrkjtxplmd =H/6P -----END PGP SIGNATURE-----

Dr. Trigon

28 Jan 28 Jan

8:40 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

At the moment it looks like using 'site._load(...)' after 'site.getUrl' help to solve the issue, e.g.:

...

...
...
if self.site.loggedInAs() is None: self.site._load(force=True)

but I feel a little bit uncomfortable having the bot log out and the re-login again - would be nice if it could STAY logged in... *hope*

Thanks for your ideas or suggestions in advance!

Greetings

On 28.01.2012 00:17, Merlijn van Deen wrote:

...

On 27 January 2012 17:31, Dr. Trigon <dr.trigon@surfeu.ch mailto:dr.trigon@surfeu.ch> wrote:

I have a quite confusing situation happening to my bot when trying to access any URL that points to foreign (but mediawiki software) wiki, like this:

...
...
...
pywikibot.getSite().getUrl(foreign_wiki_url, no_hostname = True)

But while doing this the bot seems to log-out since afterwards it is not able to edit any page anymore the "traceback" from the logs is:

Random guess: the bot sends the old site's cookies to the foreign wiki, gets new cookies back and writes those to the user-data file. Then in the next request it tries to use those cookies, which fails.

Check your cookie data file in user-data to confirm.

In any case: why are you trying to use a function that is clearly not made for this purpose, instead of using, say, urlopen, directly, or creating a family file?

Merlijn

_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8kCNsACgkQAXWvBxzBrDCz0gCfeYHVqPVyLyrVuNmh9ocOy/fN TccAoIEX8w/bkacqecwMK1QVUUNGAoZC =qUZz -----END PGP SIGNATURE-----

4513

Age (days ago)

4543

Last active (days ago)

pywikipedia-l@lists.wikimedia.org

18 comments

3 participants

tags (0)

participants (3)

Dr. Trigon
info＠gno.de
Merlijn van Deen