Bugs item #3092329, was opened at 2010-10-21 21:26
Message generated for change (Comment added) made by drtrigon
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3092329&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dr. Trigon (drtrigon)
>Assigned to: xqt (xqt)
Summary: KeyError in userlib.User.contributions
Initial Comment:
Hello all
This error occured to me during a try to get contributions of user IWorld:
Traceback (most recent call last):
File "sum_disc.py", line 1288, in <module>
main()
File "sum_disc.py", line 1284, in main
bot.run()
File "sum_disc.py", line 341, in run
self.checkRecentEdits()
File "sum_disc.py", line 566, in checkRecentEdits
usersumList = [p[0].title() for p in self._user.contributions(limit = count)]
File "/data/toolserver/pywikipedia/userlib.py", line 299, in contributions
contrib['revid'], ts, contrib['comment']
KeyError: 'comment'
The content of contrib:
{u'pageid': 5698070, u'title': u'Wikipedia:Vandalismusmeldung', u'timestamp': u'2010-10-20T11:58:45Z', u'revid': 80498587, u'user': u'IWorld', u'ns': 4, u'commenthidden': u''}
As you can see the 'comment' item is missing, but a 'commenthidden' is present. I wrote a patch to take this into account. I decided to return an u'' instead of contrib['comment'] in such a case, since in this place an unicode string is expected and thus a None or drop this item is not a good idea.
Please have a look to the attached patch.
Greetings!
----------------------------------------------------------------------
>Comment By: Dr. Trigon (drtrigon)
Date: 2010-11-04 14:06
Message:
I've taken the liberty of assigning this to you.
Should not be a big deal, but would be kind if it would be possible to
solve this... ;)
Thanks a lot!
----------------------------------------------------------------------
Comment By: Dr. Trigon (drtrigon)
Date: 2010-10-28 10:28
Message:
Also issued as:
'KeyError in userlib.User.contributions - ID: 3097185'
https://sourceforge.net/tracker/?func=detail&aid=3097185&group_id=93107&ati…
sorry for this mess - but I'm not sure which is the better place to rise
this issue...
----------------------------------------------------------------------
Comment By: Dr. Trigon (drtrigon)
Date: 2010-10-21 21:27
Message:
Sorry I forgot this info:
Pywikipedia [https] svn.toolserver.org/svnroot/drtrigon/pywikipedia (r40,
2010/10/02, 22:14:39)
Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:31)
[GCC 4.4.4 20100503 (Red Hat 4.4.4-2)]
config-settings:
use_api = True
use_api_login = True
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3092329&group_…
Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by grimlockfr
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
Summary: Problem with hi characters
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysub… for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
>Comment By: Grimlock (grimlockfr)
Date: 2010-11-02 17:03
Message:
I used Python 2.7 when I discovered this bug. The bug is not fixed in 2.7
(or in all 2.7 distributions ..)
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-02 15:47
Message:
Just a quick update: upstream has confirmed this is a bug in the python
library. It should get fixed in 2.7 and 3.2, but it is not clear yet
whether 2.6.6 will have the fix included.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 17:43
Message:
Reported to the python developers: http://bugs.python.org/issue10254
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message:
C# test code: http://pastebin.ca/1977261
This does not show this regression. The C# library does not show PR29
issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmit&d…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
Summary: Problem with hi characters
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysub… for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-02 15:47
Message:
Just a quick update: upstream has confirmed this is a bug in the python
library. It should get fixed in 2.7 and 3.2, but it is not clear yet
whether 2.6.6 will have the fix included.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 17:43
Message:
Reported to the python developers: http://bugs.python.org/issue10254
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message:
C# test code: http://pastebin.ca/1977261
This does not show this regression. The C# library does not show PR29
issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmit&d…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…