Ciencia_Al_Poder created this task. Ciencia_Al_Poder added a subscriber: Ciencia_Al_Poder. Ciencia_Al_Poder added a project: pywikibot-core. Restricted Application added subscribers: Aklapper, pywikipedia-bugs.
TASK DESCRIPTION **OS:** Linux
When I redirect the output of the python command that runs the bot to a file or another command, the output simply disappears. I wanted to generate a file based on the listpages.py script, but apparently I'm unable to do so...
```
python pwb.py listpages.py -family: -cat:'somecategory'
1 Page 1 2 Page 2 ...
python pwb.py listpages.py -family: -cat:'somecategory' > filelist.txt ls -l filelist.txt
-rw-r--r-- 1 jesus users 0 mar 21 16:55 filelist.txt
python pwb.py listpages.py -family: -cat:'somecategory' | uniq
```
Using the pipe to pass the output to another command doesn't generate output. The same when redirecting to a file.
Printing text from python directly works, though
```
python -c "print 'test'" | uniq
test ```
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ciencia_Al_Poder Cc: pywikipedia-bugs, Ciencia_Al_Poder, Aklapper, jayvdb
XZise added a subscriber: XZise. XZise added a comment.
That is weird. On my laptop with bash it does work as expected:
xzise@localhost:~/Programms/pywikibot/core$ python pwb.py listpages -cat:Metatemplates 1 Celestial Bodies/Link 2 Celestial period table/row … xzise@localhost:~/Programms/pywikibot/core$ python pwb.py listpages -cat:Metatemplates | grep Info 6 Infobox/Kerbonaut/bar 7 Infobox/Line xzise@localhost:~/Programms/pywikibot/core$ python pwb.py listpages -cat:Metatemplates > test_out xzise@localhost:~/Programms/pywikibot/core$ cat test_out 1 Celestial Bodies/Link 2 Celestial period table/row … xzise@localhost:~/Programms/pywikibot/core$ python -c "print('test')" | uniq test
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
Ciencia_Al_Poder added a comment.
I've just discovered that when using categories with non-ascii characters on it, the redirection is lost. If the category contains plain ascii letters the output is redirected successfully:
jesus@charmander:~/git/mediawiki/pywikibot/core> python pwb.py listpages.py -family:wikipedia -lang:es -cat:'.hack' 1 .hack 2 .hack//G.U. 3 .hack//Liminality 4 .hack//Roots 5 .hack//SIGN 6 The World (.hack) jesus@charmander:~/git/mediawiki/pywikibot/core> python pwb.py listpages.py -family:wikipedia -lang:es -cat:'.hack' |uniq 1 .hack 2 .hack//G.U. 3 .hack//Liminality 4 .hack//Roots 5 .hack//SIGN 6 The World (.hack) jesus@charmander:~/git/mediawiki/pywikibot/core> python pwb.py listpages.py -family:wikipedia -lang:es -cat:'1. FC Nürnberg' 1 FC Nuremberg II 2 F. C. Núremberg 3 Stadion Nürnberg jesus@charmander:~/git/mediawiki/pywikibot/core> python pwb.py listpages.py -family:wikipedia -lang:es -cat:'1. FC Nürnberg' |uniq jesus@charmander:~/git/mediawiki/pywikibot/core>
I'm using Python 2.7.8 on Open SuSE 13.2
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ciencia_Al_Poder Cc: XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
valhallasw added a subscriber: valhallasw. valhallasw added a comment.
After debugging with xzise, we figured out it was a combination of issues.
1. When piping, sys.stdout.encoding is set to None (on Python 2.7) 2. When sys.stdout.encoding is set to None, config2.py falls back to 'iso-8859-1' (!) 3. sys.argv is decoded using sys.stdout.encoding
Thus the category '1. FC Nürnberg' is decoded to u'1. FC Nürnberg', which obviously doesn't exist, and then no pages are listed.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw Cc: valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
XZise added a comment.
By the way I think it would've been easier if listpages would say that no pages were found. That way we could've known that not the output is a problem but something with the generator.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
gerritbot added a subscriber: gerritbot. gerritbot added a comment.
Change 198515 had a related patch set uploaded (by Merlijn van Deen): listpages: report number of pages found
https://gerrit.wikimedia.org/r/198515
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: gerritbot Cc: gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
gerritbot added a project: Patch-For-Review.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: gerritbot Cc: gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
gerritbot added a comment.
Change 198515 merged by jenkins-bot: listpages: report number of pages found
https://gerrit.wikimedia.org/r/198515
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: gerritbot Cc: gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
Ciencia_Al_Poder changed the title from "stdout output from script is lost when redirecting to a file or other commands" to "input encoding is switched to plain ascii when redirecting output to a file or other commands, mangling non-ascii characters". Ciencia_Al_Poder set Security to None.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ciencia_Al_Poder Cc: gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
valhallasw edited the task description.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw Cc: gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
valhallasw edited the task description.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw Cc: gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
elyashiv added a subscriber: elyashiv. elyashiv added a comment.
Another work-around is using python3, as python3 strings are utf-8 encoded by default.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: elyashiv Cc: elyashiv, gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
XZise added a comment.
That is … technically not correct. While Python 3 does work it's not because the strings are UTF-8 encoded but because the encoding for the streams are known. It uses `sys.stdout.encoding` which is `None` when piping and that then leads to pywikibot using Latin-1 or cp850.
Now I'm not sure what they are encoded internally but for someone writing Python scripts it shouldn't matter. I just wanted to say this here as a string is not immediately a UTF-8 encoded string and you need to encode it in any case (even into UTF-8) when you want to store it.
TASK DETAIL https://phabricator.wikimedia.org/T93474
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: elyashiv, gerritbot, valhallasw, XZise, Ciencia_Al_Poder, Aklapper, jayvdb, pywikipedia-bugs
pywikipedia-bugs@lists.wikimedia.org