Bugs item #3496399, was opened at 2012-03-02 13:32 Message generated for change (Settings changed) made by binbot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3496399...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: login Group: None
Status: Closed Resolution: Works For Me
Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Major unicode problems in login.py and possibly elsewhere
Initial Comment: Hi to all.
I have used pywikipedia bots extensively in the past, with more than 30000 articles uploaded. But every few months, new problems seem to crop up :) Maybe it would be wise to use *stable* and *testing* versions of pywikipedia? Anyway, on to the problem:
Part of output when trying to run command
python login.py
... Select family of sites we are working on (default: wikipedia): 27 The language code of the site we're working on (default: 'en'): sr Username (sr wikipedia): МирославЋикаБот Traceback (most recent call last): File "login.py", line 58, in <module> import re, os, query File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/query.py", line 29, in <module> import wikipedia as pywikibot File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/wikipedia.py", line 142, in <module> from pywikibot import * File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/pywikibot/__init__.py", line 15, in <module> from exceptions import * File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/pywikibot/exceptions.py", line 13, in <module> import config File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/config.py", line 533, in <module> _base_dir = _wt.get_base_dir() File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/wikipediatools.py", line 51, in get_base_dir create_user_config_file(base_dir) File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/wikipediatools.py", line 6, in create_user_config_file generate_user_files.create_user_config(base_dir) File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/generate_user_files.py", line 57, in create_user_config username = unicode(username, console_encoding) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) miroslav@shop-2:~/moji/Vikipedija/pywikipedia/pywikipedia$
I have tried many workarounds, changing user-config.py, changing password to ascii only etc, to little avail. I firmly suspect the FIRST problem is unicode name of the bot.
By commenting out some lines in wikipedia.py I was able to somehow log in, but when trying to work on unicode articles similar problems would appear. So unicode support in replace.py and pagefromfile.py (i tried those two after somehow logging in) is totally broken, I suspect.
Python, Linux used:
Python 2.5.2 (r252:60911, Jan 24 2010, 14:53:14) [GCC 4.3.2] on linux2
Debian GNU/Linux 5.0 2.6.26.-2-686
Thank you for your help.
Miroslav Cika miroslavus At yahoo dott com
Also, output of
python version.py
as requested:
... miroslav@shop-2:~/moji/Vikipedija/pywikipedia/pywikipedia$ python version.py No user-config.py found in directory '/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia' Creating...
1: memoryalpha 2: wikitravel 3: loveto 4: meta 5: wikinews 6: openttd 7: lyricwiki 8: test 9: strategy 10: wikibond 11: anarchopedia 12: i18n 13: incubator 14: fon 15: twcareer 16: wiktionary 17: wikitravel_shared 18: vikidia 19: commons 20: uncyclopedia 21: celtic 22: wesolve 23: southernapproach 24: wowwiki 25: wikisource 26: omegawiki 27: wikipedia 28: wikibooks 29: mediawiki 30: wikitech 31: species 32: ubuntutw 33: krefeldwiki 34: battlestarwiki 35: gentoo 36: lockwiki 37: supertux 38: wikiversity 39: botwiki 40: wekey 41: mozilla 42: wikiquote 43: wikia 44: osm Select family of sites we are working on (default: wikipedia): 27 The language code of the site we're working on (default: 'en'): sr Username (sr wikipedia): МирославЋикаБот Traceback (most recent call last): File "version.py", line 15, in <module> import config File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/config.py", line 533, in <module> _base_dir = _wt.get_base_dir() File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/wikipediatools.py", line 51, in get_base_dir create_user_config_file(base_dir) File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/wikipediatools.py", line 6, in create_user_config_file generate_user_files.create_user_config(base_dir) File "/home/miroslav/mydocs/LicniProjekti/Vikipedija/pywikipedia/pywikipedia/generate_user_files.py", line 57, in create_user_config username = unicode(username, console_encoding) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) miroslav@shop-2:~/moji/Vikipedija/pywikipedia/pywikipedia$
----------------------------------------------------------------------
Comment By: McScrewDriver (mcscrewdriver) Date: 2012-03-07 10:08
Message: Guys, thank you for your great help.
This is not a pywikipedia bug at all as it turns out.
I started being suspicious after seeing that cyrillic characters will not always show properly in Gnome terminal and in Python command line.
It turns out I did not reinstall locales for my language after reinstalling Debian some months ago. After reinstalling locales as shown here (http://people.debian.org/~schultmc/locales.html):
1. Install debconf (i.e. run apt-get update then apt-get install debconf, as root) 2. Run dpkg-reconfigure locales as root
and adding 3 sr locales, terminal started working correctly.
Now I still had some problems when uploading articles containing unicode characters to sr wiki. This was fixed by editing user-config.py as shown:
# -*- coding: utf-8 -*- family = 'wikipedia' # The language code of the site we're working on. mylang = 'sr'
# The dictionary usernames should contain a username for each site where you # have a bot account. usernames['wikipedia']['sr'] = u'МирославЋикаБот'
log = [] console_encoding = 'utf-8' # textfile_encoding = 'unicode_escape'
Line with unicode escape encoding must be commented out, and cyrillic chars upload correctly from then on.
I thank you again for your help and pointers.
Miroslav
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw) Date: 2012-03-03 03:15
Message: Based on the errors you get, the following is happening: 1) you have not created user-config.py manually 2) you have misconfigured your console (check the output of locale - it's probably C or ASCII, not, say, sr_SR.utf-8) 3) you have a non-ascii username
Because of the non-ascii username, and the encoding of the console set to ascii, it is impossible to enter it in the console. You should therefore create user-config.py yourself, add an utf-8 header:
# -*- coding: utf-8 -*-
and enter your username there (and make sure to save the file as utf-8). The problem should then be solved.
"But every few months, new problems seem to crop up :) Maybe it would be
wise to use *stable* and *testing* versions of pywikipedia? " These problems are generally due to changes in mediawiki, or changes in configuration. This would not be solved by using a stable version. However, you can easily get an older version from the SVN repository.
----------------------------------------------------------------------
Comment By: Bináris (binbot) Date: 2012-03-02 14:30
Message: I messed it up, sentence "Also, position 0" is intended to be at the end.
----------------------------------------------------------------------
Comment By: Bináris (binbot) Date: 2012-03-02 14:29
Message: Hi Мирослав,
I use replace.py daily on Unicode articles and at least in one wiki with Unicode login name. The problem must be with your computer. However, I know, character encoding is the biggest mess of informatics. Also, "position 0:" is suspisious, I saw this when there was a BOM mark that is not usual on Linux.
Please try to determine when the problems began and what has changed from the time you successfully used your bot last time.
Normally version.py won't write "No user-config.py found". Please check the chmod settings for user-config.py, path and encoding (it should be in Unicode wothout a BOM with codin_ utf-8 in the 1st or 2nd line, that's the best).
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3496399...