Rubin16 created this task. Rubin16 added subscribers: pywikipedia-bugs, Rubin16. Rubin16 added a project: pywikibot-core. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION It is quite a strange problem that I observe for a long time and I am not sure how to reproduce it...
I am running the latest reflinks.py from core on Labs: https://ru.wikipedia.org/w/index.php?limit=50&tagfilter=&title=%D0%A...
As you can see from the bot's contributions, the bot starts working well but then stop editing while logs on the labs show that the script is still running and the bot even saves pages... For example, I see in the recent output the following:
``` @@ -57 +57 @@ - Na sed'moj den' s"лmok v ofise odnoj iz kinokompanij, osushchestvlyayushchikh proizvodstvo fil'ma, prorvalo trubu, i ofis byl zatoplen. Stoit otmetit', chto syuzhet i sobytiya fil'ma tesno svyazany s vodnoj sredoj. Odnako na иtom strannye sluchai ne konchilis': vo vremya s"лmok rekvizitnyj furgon «okkupiroval» roj pchлl, v svyazi s иtim ves' rekvizit prishlos' spasat'. Pchлly ischezli tak zhe vnezapno, kak i poyavilis'. Takzhe v uzhe upomyanutom ofise opyat' proizoshlo zatoplenie - na иtot raz vinovnikom иtogo stal razlivshijsya bol'shoj kontejner s vodoj. Strannyj sluchaj proizoshлl i s kostyumershej fil'ma - vyjdya s parkovki, nakhodyashchejsya na territorii studii ''[[Universal Pictures]]'', ona uvidela begushchego k nej po asfal'tu olenya. V иtom sluchae takzhe proslezhivaetsya svyaz' s sobytiyami fil'ma, gde mozhno videt' pokhozhuyu scenu.<ref name="amr">[http://www.kinopoisk.ru/level/1/film/61305/view_info/ok/ Informaciya o fil'me na kinopoisk.ru]</ref> Suevernyj Khidиo Nakata posle podobnykh proisshestvij pribegal k ritualu izgnaniya zlykh dukhov. *** + Na sed'moj den' s"лmok v ofise odnoj iz kinokompanij, osushchestvlyayushchikh proizvodstvo fil'ma, prorvalo trubu, i ofis byl zatoplen. Stoit otmetit', chto syuzhet i sobytiya fil'ma tesno svyazany s vodnoj sredoj. Odnako na иtom strannye sluchai ne konchilis': vo vremya s"лmok rekvizitnyj furgon «okkupiroval» roj pchлl, v svyazi s иtim ves' rekvizit prishlos' spasat'. Pchлly ischezli tak zhe vnezapno, kak i poyavilis'. Takzhe v uzhe upomyanutom ofise opyat' proizoshlo zatoplenie - na иtot raz vinovnikom иtogo stal razlivshijsya bol'shoj kontejner s vodoj. Strannyj sluchaj proizoshлl i s kostyumershej fil'ma - vyjdya s parkovki, nakhodyashchejsya na territorii studii ''[[Universal Pictures]]'', ona uvidela begushchego k nej po asfal'tu olenya. V иtom sluchae takzhe proslezhivaetsya svyaz' s sobytiyami fil'ma, gde mozhno videt' pokhozhuyu scenu.<ref name=autogenerated1>[http://www.kinopoisk.ru/level/1/film/61305/view_info/ok/ Informaciya o fil'me na kinopoisk.ru]</ref> Suevernyj Khidиo Nakata posle podobnykh proisshestvij pribegal k ritualu izgnaniya zlykh dukhov. ***
Comment: bot: dobavlenie zagolovkov v snoski; ispravlenie dvojnykh snosok, sm. [[mw:Manual:Pywikibot/refLinks|ChaVo]] *** ```
That's the edit in the following article: https://ru.wikipedia.org/w/index.php?title=%D0%97%D0%B2%D0%BE%D0%BD%D0%BE%D0...
Though the bot says that the page was edited, there was no actual changes in the history.
I am gonna clean up the logs and start the bot again but if you have advice how to catch the problems and the issue, I will be glad to use your instructions.
For information: I run the bot by the following command:
``` jsub -N reflinks -once -continuous -mem 2048M sh refs.sh ```
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Aklapper, Rubin16, jayvdb, pywikipedia-bugs
Rubin16 added a comment.
The same issue happened today: I ran the bot with extended memory limit up to 4096M as I thought it could be issue.
The bot saved changes till the end of April 1: https://ru.wikipedia.org/wiki/%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%...
Now I see in logs that bot said that it had saved changes to a number of articles but there is nothing similar in reality:
https://ru.wikipedia.org/w/index.php?title=%D0%97%D1%83%D1%81%D0%BC%D0%B0%D0...
@@ -12 +12 @@ - Naibol'shij nauchnyj rezonans<ref>[https://scholar.google.com/scholar?cites=3475199132505430838&as_sdt=5,33... Spisok nauchnykh rabot, citiruyushchikh monografiyu «Directed Evolution»]</ref> poluchila razrabotannaya Borisom Zlotinym i Alloj Zusman «sistema napravlennoj иvolyucii», izlozhennaya imi v knige «Directed Evolution: Philosophy, Theory and Practice» (2001, 2004). Sredi drugikh razrabotok - «diversionnyj analiz» (metod vyyavleniya i prognozirovaniya avarijnykh situacij i nezhelatel'nykh yavlenij)<ref>[http://vikent.ru/enc/852/ Diversionnyj analiz]</ref><ref>[http://www.rusnor.org/pubs/articles/11541.htm Oleg Figovskij «Tekhnologiya innovacij»]</ref>. *** + Naibol'shij nauchnyj rezonans<ref name=autogenerated1>[https://scholar.google.com/scholar?cites=3475199132505430838&as_sdt=5,33... Spisok nauchnykh rabot, citiruyushchikh monografiyu «Directed Evolution»]</ref> poluchila razrabotannaya Borisom Zlotinym i Alloj Zusman «sistema napravlennoj иvolyucii», izlozhennaya imi v knige «Directed Evolution: Philosophy, Theory and Practice» (2001, 2004). Sredi drugikh razrabotok - «diversionnyj analiz» (metod vyyavleniya i prognozirovaniya avarijnykh situacij i nezhelatel'nykh yavlenij)<ref>[http://vikent.ru/enc/852/ Diversionnyj analiz]</ref><ref>[http://www.rusnor.org/pubs/articles/11541.htm Oleg Figovskij «Tekhnologiya innovacij»]</ref>. ***
@@ -39 +39 @@ - * Alla Zusman, Boris Zlotin. Directed Evolution: Philosophy, Theory and Practice. Ideation International Inc., 2001, 2004.<ref>[https://scholar.google.com/scholar?cites=3475199132505430838&as_sdt=5,33... Spisok nauchnykh rabot, citiruyushchikh monografiyu «Directed Evolution»]</ref> *** + * Alla Zusman, Boris Zlotin. Directed Evolution: Philosophy, Theory and Practice. Ideation International Inc., 2001, 2004.<ref name=autogenerated1 /> *** *** Comment: bot: dobavlenie zagolovkov v snoski; ispravlenie dvojnykh snosok, sm. [[mw:Manual:Pywikibot/refLinks|ChaVo]] ***
I don't know how to cope with this problem...
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Aklapper, Rubin16, jayvdb, pywikipedia-bugs
Rubin16 raised the priority of this task from "Normal" to "High". Rubin16 set Security to None.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Aklapper, Rubin16, jayvdb, pywikipedia-bugs
Rubin16 added a comment.
I started the bot with -debug option, the same output is shown but one warning message is added now:
WARNING: /data/project/rubinbot2/core/pywikibot/tools/ip.py:53: DeprecationWarning: ipaddress backport not available.
Couldn't it be the reason?
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Aklapper, Rubin16, jayvdb, pywikipedia-bugs
XZise added a subscriber: XZise. XZise added a comment.
But this warning is shown at all times isn't it (or on startup)? So that wouldn't really explain why it's suddenly stops working and I don't know how this could interfere without showing more errors.
But the missing errors are what I'm wondering. I mean if it would fail about an hour after it did the first edit then it could be related to T61678: Implement badtoken detection and recovery https://phabricator.wikimedia.org/T61678 but that would cause an error. But otherwise I don't see why it would __silently__ not do edits. When you log it (afaik does -debug also activates logging it) it should show you the requests it does to the API and also the responses. Do you see requests happening after the bot starts working?
Another problem (related to the logs) I have is that I can only see edits which work :/
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: XZise, Aklapper, Rubin16, jayvdb, pywikipedia-bugs
Rubin16 added a comment.
Yes, the error about ipadress backport is shown every edit...
I started a version from compat now and it works correctly.
There are no other errors/warnings shown while using -debug feature, I can start it once more and upload full traceback somewhere if it would be helpful.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: XZise, Aklapper, Rubin16, jayvdb, pywikipedia-bugs
jayvdb added a subscriber: jayvdb.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
When you actually mean traceback https://en.wikipedia.org/wiki/Stack_trace then yeah please post it because that is the most helpful information for bug reports. In fact you always should include it in any bug report (not only with pywikibot but for all software which shows you a traceback/stack trace).
If you just mean the log, well it could be also helpful as I have no idea what could cause that and maybe that would clear things up. It's very hard to fix something when the description of the bug is very vague and it's not a common bug. Just check through the log that it doesn't accidentally contain sensitive information (like passwords).
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a comment.
In https://phabricator.wikimedia.org/T94688#1179849, @XZise wrote:
When you actually mean traceback https://en.wikipedia.org/wiki/Stack_trace then yeah please post it because that is the most helpful information for bug reports. In fact you always should include it in any bug report (not only with pywikibot but for all software which shows you a traceback/stack trace).
If you just mean the log, well it could be also helpful as I have no idea what could cause that and maybe that would clear things up. It's very hard to fix something when the description of the bug is very vague and it's not a common bug. Just check through the log that it doesn't accidentally contain sensitive information (like passwords).
I am gonna run the bot with the -debug parameter and then post the whole output... Is it possible to gather more information with other parameters?
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Not sure if `-v -v` helps. It certainly produces a more verbose output ;)
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a comment.
The output is quite huge, so I am posting parts of it:
1. from the beginning to the first article saved (saved per output and per actual edits, too):
2. the second edit that was actually saved
After those 2 edits no pages were saved - I looked for "@@" in output and pasted such saves that weren't saved, the bot has some API problems as I see:
1. http://pastebin.com/eRRkvDwP 2. http://pastebin.com/Re5Jxxit
All other edits show the same error.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Okay interesting. It seems to be T61678: Implement badtoken detection and recovery https://phabricator.wikimedia.org/T61678. The last two log entries were very helpful even though a bit confusing, but the actual error is:
Error saving page [[Al'-Dzhazira]] (badtoken: Invalid token) ***
Traceback (most recent call last): File "/data/project/rubinbot2/core/pywikibot/page.py", line 1047, in _save watch=watchval, bot=botflag, **kwargs) File "/data/project/rubinbot2/core/pywikibot/site.py", line 1057, in callee return fn(self, *args, **kwargs) File "/data/project/rubinbot2/core/pywikibot/site.py", line 4140, in editpage result = req.submit() File "/data/project/rubinbot2/core/pywikibot/data/api.py", line 1698, in submit raise APIError(code, info, **result["error"]) APIError: badtoken: Invalid token
If you could open `scripts/reflinks.py` and add the 3rd line of the following snippet (line 520):
editedpages = 0 for page in self.generator: page.site.tokens._tokens.clear() try: # Load the page's text from the wiki
That is the work around I was talking about and clears the token cache. In that case it shouldn't suddenly stop working and if it works then for you I think there should be really a fix for https://phabricator.wikimedia.org/T61678 happening soon.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a blocking task: T61678: Implement badtoken detection and recovery.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Okay the logs have been a bit confusing at first because the responses didn't contain any information about the error itself so I tried to determine why that is and the result is in https://gerrit.wikimedia.org/r/#/c/201954/ . It won't fix your problem but in the logs it should now show the complete error message looking something like that:
{u'servedby': u'mw1233', u'error': {'help': u'See https://ru.wikipedia.org/w/api.php for API usage', 'info': 'Invalid token', 'code': 'badtoken'}}
And that would tell it immediately. But I'm still confused why you had to activate the log to show the error. It should've shown at least something.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Ricordisamoa added a subscriber: Ricordisamoa.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ricordisamoa Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a comment.
I updated the bot to the latest revision (but not changed anything in reflinks.py in your post about badtoken recovery), the crash reports are the following.
Not it seems that it doesn't save not only because of API problems, but something more happens.
No changes were needed on [[Ziggy Stardust]] No changes were needed on [[Zhuzhou Electric Locomotive]] No changes were needed on [[Zimmer (avtomobil'naya marka)]] *** Traceback (most recent call last): File "core/pwb.py", line 227, in <module> run_python_file(filename, argv, argvu, file_package) File "core/pwb.py", line 85, in run_python_file main_mod.__dict__) File "core/scripts/reflinks.py", line 846, in <module> main() File "core/scripts/reflinks.py", line 843, in main bot.run() File "core/scripts/reflinks.py", line 647, in run linkedpagetext = self.NON_HTML.sub('', linkedpagetext) UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 17717: ordinal not in range(128) Found 1 wikipedia:mk processes running, including this one. Dropped throttle(s). CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort All threads finished. === Pywikibot framework v2.0 -- Logging header === COMMAND: ['reflinks.py', '-start:Z', '-always', '-ignorepdf', '-debug', '-log:refs.txt', '-v', '-v'] DATE: 2015-04-18 15:53:55.071461 UTC VERSION: [https] r-pywikibot-core.git (c6ad0b3, g5374, 2015/04/17, 22:16:31, n/a) SYSTEM: ('Linux', 'tools-exec-10', '3.2.0-59-virtual', '#90-Ubuntu SMP Tue Jan 7 23:02:33 UTC 2014', 'x86_64') CONFIG FILE DIR: /data/project/rubinbot2/core PACKAGES: _elementtree (/usr/lib/python2.7/lib-dynload/_elementtree.so) = 1.0.6 distutils (/usr/lib/python2.7/distutils/) = 2.7.3 email (/usr/lib/python2.7/email/) = 4.0.3 json (/usr/lib/python2.7/json/) = 2.0.9 logging (/usr/lib/python2.7/logging/) = 0.5.1.2 mwparserfromhell (/usr/lib/python2.7/dist-packages/mwparserfromhell/) = 0.3.3 pickle (/usr/lib/python2.7/pickle.pyc) = $Revision: 72223 $ pyexpat (/usr/lib/python2.7/lib-dynload/pyexpat.so) = 2.7.3 pywikibot ([path unknown]) = ?? re (/usr/lib/python2.7/re.pyc) = 2.2.1 urllib (/usr/lib/python2.7/urllib.pyc) = 1.17 urllib2 (/usr/lib/python2.7/urllib2.pyc) = 2.7 MODULES: /data/project/rubinbot2/core/pywikibot/comms/http.py 54549f1 2015-04-18 15:48:23.664873 /data/project/rubinbot2/core/pywikibot/data/api.py 41d6c82 2015-04-18 15:48:23.744875 /data/project/rubinbot2/core/pywikibot/userinterfaces/__init__.py 43eceeb 2015-04-18 15:48:24.104884 /data/project/rubinbot2/core/pywikibot/textlib.py 410c100 2015-04-18 15:48:24.068883 /data/project/rubinbot2/core/pywikibot/i18n.py 2fff519 2015-04-18 15:48:23.964880 /data/project/rubinbot2/core/pywikibot/comms/threadedhttp.py a388c0f 2015-04-18 15:48:23.684874 /data/project/rubinbot2/core/pywikibot/date.py 976edef 2015-04-18 15:48:23.768876 /data/project/rubinbot2/core/pywikibot/data/__init__.py 44183c7 2015-04-18 15:48:23.732875 /data/project/rubinbot2/core/pywikibot/exceptions.py 117165a 2015-04-18 15:48:23.788876 /data/project/rubinbot2/core/pywikibot/site.py 5848169 2015-04-18 15:48:24.056883 /data/project/rubinbot2/core/pywikibot/bot.py 446e8e1 2015-04-18 15:48:23.648873 /data/project/rubinbot2/core/pywikibot/__init__.py db96621 2015-04-18 15:48:23.628872 /data/project/rubinbot2/core/pywikibot/throttle.py 4157254 2015-04-18 15:48:24.080883 /data/project/rubinbot2/core/pywikibot/page.py a2eb04c 2015-04-18 15:48:24.016882 /data/project/rubinbot2/core/pywikibot/family.py f0d0d8f 2015-04-18 15:48:23.948880 /data/project/rubinbot2/core/pywikibot/xmlreader.py 2b77234 2015-04-18 15:48:24.192886 /data/project/rubinbot2/core/pywikibot/plural.py c9edb6b 2015-04-18 15:48:24.032882 /data/project/rubinbot2/core/scripts/noreferences.py 3fa562a 2015-04-18 15:48:24.648897 /data/project/rubinbot2/core/pywikibot/version.py 85f37a9 2015-04-18 15:48:24.184886 /data/project/rubinbot2/core/pywikibot/userinterfaces/terminal_interface.py 9a5fbf1 2015-04-18 15:48:24.124884 /data/project/rubinbot2/core/pywikibot/config2.py 971b19d 2015-04-18 15:48:23.728874 /data/project/rubinbot2/core/pywikibot/tools/ip.py 808c0cc 2015-04-18 15:48:24.100884 /data/project/rubinbot2/core/pywikibot/userinterfaces/terminal_interface_base.py 968a14b 2015-04-18 15:48:24.136885 /data/project/rubinbot2/core/pywikibot/pagegenerators.py e07bf44 2015-04-18 15:48:24.028882 /data/project/rubinbot2/core/pywikibot/userinterfaces/terminal_interface_unix.py 60d8cb2 2015-04-18 15:48:24.140885 /data/project/rubinbot2/core/pywikibot/tools/__init__.py 22d6c4d 2015-04-18 15:48:24.096884 /data/project/rubinbot2/core/pywikibot/diff.py 6e0e38b 2015-04-18 15:48:23.772876 /data/project/rubinbot2/core/pywikibot/login.py c0cc049 2015-04-18 15:48:23.992881 /data/project/rubinbot2/core/pywikibot/comms/__init__.py 747d0a7 2015-04-18 15:48:23.660873 /data/project/rubinbot2/core/pywikibot/userinterfaces/transliteration.py efd4103 2015-04-18 15:48:24.160885 === === === === === === === === === === === === === === Pywikibot rdb9662181f969ff16548641a4fb937164b4647bb Python 2.7.3 (default, Feb 27 2014, 19:58:35) [GCC 4.6.3] LOADING SITE wikipedia:ru VERSION: 1.26wmf1 LOADING SITE wikipedia:ru VERSION: 1.26wmf1 Found 1 wikipedia:ru processes running, including this one.
After that the bot restarts, and crashes again and again with the same message.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a comment.
Some more output about the unicode crash:
The base directory is /data/project/rubinbot2/core HTTP error (401) for http://books.google.ru/books?id=dLsthWsAHk0C&pg=PT1163&dq=ZKRD&h... on [[ZKRD]] HTTP error (401) for http://books.google.ru/books?id=j2tmrfsX4tAC&pg=RA18-PT1192&dq=ZKRD&... on [[ZKRD]] HTTP error (401) for http://books.google.ru/books?id=7DiB3z2fBpAC&pg=PT464&dq=%22zero-pla... on [[Zero Player Game]] <type 'exceptions.UnicodeDecodeError'>
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Okay as far as I can see, it is querying the server and gets bytes back but then it tries to apply a regex and to do that it apparently needs to decode the bytes and it's using the ASCII codec for that. I'm not 100 % sure why it tries to decode it here and why it had worked before (I guess T89589: Usage of unicode_literals from __future__ package https://phabricator.wikimedia.org/T89589 plays a role). I need to take a closer look on how to fix that properly.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a blocking task: T89589: Usage of unicode_literals from __future__ package.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Okay I'm confused. It works fine for me:
Python 2.7.8 (default, Aug 28 2014, 16:47:22) [GCC 4.8.3 20140624 (Red Hat 4.8.3-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import re re.match('A', u'Ö'.encode('latin1')) re.match(u'A', u'Ö'.encode('latin1')) re.sub(u'A', '', u'Ö'.encode('latin1'))
'\xd6'
re.sub(u'(?is)A', '', u'Ö'.encode('latin1'))
'\xd6'
xzise@localhost:~/Programms/pywikibot/core$ python Python 2.7.8 (default, Aug 28 2014, 16:47:22) [GCC 4.8.3 20140624 (Red Hat 4.8.3-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from __future__ import unicode_literals import re re.sub(u'(?is)A', '', u'Ö'.encode('latin1'))
'\xd6'
re.sub('(?is)A', '', u'Ö'.encode('latin1'))
'\xd6'
R = re.compile('(?is)A') R.sub('', u'Ö'.encode('latin1'))
'\xd6'
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
Rubin16 added a comment.
@XZise if you want, I can temporary give you autopatrolled or bot rights in ru.wiki and you can try to run the reflinks.py there... I tried to run the bot from another set of articles - the same decoding error appears in other articles, too.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Rubin16 Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Ah I think no need for that, because I think I know what is happening:
from __future__ import unicode_literals import re re.sub('(?is)A', '', 'Ö'.encode('latin1'))
'\xd6'
re.sub('(?is)A', '', 'ÖA'.encode('latin1'))
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xzise/.pyenv/versions/2.7.8/lib/python2.7/re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal not in range(128)
The error only appears when it actually replaces anything. In my previous examples it didn't replaced anything and it worked. But when it replaces something it tries to put the unicode into the bytes which doesn't work. You could test and verify that when you edit the line where the error happens (from your previous errors that is "core/scripts/reflinks.py" in line 647). Currently it looks like this:
linkedpagetext = self.NON_HTML.sub('', linkedpagetext)
But it should work when it looks like this:
linkedpagetext = self.NON_HTML.sub(str(''), linkedpagetext)
I need to figure out if `linkedpagetext` is also `bytes` in Python 3 but that fix will work at least in Python 2.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
gerritbot added a subscriber: gerritbot. gerritbot added a comment.
Change 205095 had a related patch set uploaded (by XZise): [FIX] reflinks: Expect that urlopen return bytes
https://gerrit.wikimedia.org/r/205095
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: gerritbot Cc: gerritbot, Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
gerritbot added a project: Patch-For-Review.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: gerritbot Cc: gerritbot, Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
XZise added a comment.
Okay if you want you could try that patch. It seemed that this script wasn't able to run at all with Python 3.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: gerritbot, Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
gerritbot added a comment.
Change 205095 merged by jenkins-bot: [FIX] reflinks: Expect that urlopen return bytes
https://gerrit.wikimedia.org/r/205095
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: gerritbot Cc: gerritbot, Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
jayvdb closed this task as "Resolved". jayvdb claimed this task.
TASK DETAIL https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: gerritbot, Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
pywikipedia-bugs@lists.wikimedia.org