Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=308110…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
Summary: Problem with hi characters
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the
interwiki link to hi, and, as a consequence, the link, which is identified as a bad one,
is removed when I use -cleanup option (see here
http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=histor…
for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license"
for more information.
>> import unicodedata
>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license"
for more information.
>> import unicodedata
>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubm…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=308110…