Bugs item #3081100, was opened at 2010-10-04 21:53 Message generated for change (Comment added) made by valhallasw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None Status: Open Resolution: Wont Fix Priority: 7 Private: No Submitted By: Grimlock (grimlockfr) Assigned to: xqt (xqt) Summary: Problem with hi characters
Initial Comment: Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48) Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] config-settings: use_api = True use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=history... for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message: C# test code: http://pastebin.ca/1977261 This does not show this regression. The C# library does not show PR29 issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw) Date: 2010-10-27 23:16
Message: One last comment: the problem does not appear in python < 2.6.5. Consider using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw) Date: 2010-10-27 22:54
Message: The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n"; print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) . "\n";
returns the expected
e0ad87cc80e0acbe e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody) Date: 2010-10-27 22:36
Message: Probably related to http://svn.python.org/view/python/branches/release26-maint/Modules/unicodeda... , and hence http://bugs.python.org/issue1054943# and http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody) Date: 2010-10-27 22:22
Message: Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to normalizing UTF-8 strings.
Check out the following: (on py27) Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5 Type "help", "copyright", "credits" or "license" for more information.
import unicodedata unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') == u'\u092e\u093e\u0930\u094d\u0915 \u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917' False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6 Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5 Type "help", "copyright", "credits" or "license" for more information.
import unicodedata unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') == u'\u092e\u093e\u0930\u094d\u0915 \u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917' True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel) Date: 2010-10-22 23:34
Message: Hi, my bot still make the mistakes http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmi...
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-10-12 09:10
Message: Some bots are still involved to this bug: http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezial...
----------------------------------------------------------------------
Comment By: DJSasso (djsasso) Date: 2010-10-07 21:02
Message: Nevermind...I just noticed that you made a change to not remove hi links in autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso) Date: 2010-10-07 20:38
Message: I should note this morning I updated to the most recent build and have not seen it since. And its been about 6 hours now since then. So it may have fixed itself in the most recent build. Or I may have just been lucky and not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso) Date: 2010-10-07 20:21
Message: Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-10-07 18:35
Message: Most problems came from SassoBot, MastiBot, User:ChuispastonBot, VolkowBot, see http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_h...
With actual py version deleting of hi-links is stopped. Well I'll investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso) Date: 2010-10-07 14:26
Message: In doing some cleanup of my bots edits on one wiki. I have seen atleast 4 other bots doing this recently. So there is clearly an issue somewhere. I was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso) Date: 2010-10-07 12:33
Message: It is doing it for me as well. Has been for the last few days, but seeing as other bot seemed to fix it immediately I didn`t think it was a big issue or was maybe my machine. So I was trying to figure it out on my own. But if its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-10-05 15:17
Message: I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100...
pywikipedia-bugs@lists.wikimedia.org