Bugs item #3208738, was opened at 2011-03-13 13:43
Message generated for change (Comment added) made by ganz-ru
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208738&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 8
Private: No
Submitted By: Tanvir Rahman (tanvirglhs)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki.py is replacing texts
Initial Comment:
Hello, few days ago, my interwiki bot was malfunctioning 1/2 days ago. It was replacing words of the main article, what interwiki.py does not suppose to do. After I got informed about that problems, I stopped it, and re-installed Pywikipedia, and started everything over. As far as I know, that's not happening again now, but I still curious to know why was that happening in case of few edits. Since I don't know the cause, so I don't have the clue how to solve it in future.
The problematic diffs are as follows:
1. http://sl.wikipedia.org/wiki/?&diff=2826459&oldid=2826454
2. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86257583
3. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86259285
4. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86258895
5. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86259821
Here is the copy of my current version.py. The actual Pywikipedia version was a few revisions earlier.
Pywikipedia [http] trunk/pywikipedia (r9042, 2011/03/13, 10:14:47)
Python 2.7.1 (r271:86832, Jan 4 2011, 13:57:14)
[GCC 4.5.2]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
Comment By: GanZ (ganz-ru)
Date: 2011-03-18 04:54
Message:
Same problem:
* http://nds-nl.wikipedia.org/w/index.php?diff=182567&oldid=181429
* http://ru.wikipedia.org/w/index.php?diff=32635182&oldid=32613311
* http://ru.wikipedia.org/w/index.php?diff=32634369&oldid=32072584 (edit's
comment does not correspond to action. 2 lines were removed by
cosmetic_changes.py)
* http://nds-nl.wikipedia.org/w/index.php?diff=182568&oldid=181967
All of these edits have 2 common points:
1) Date - 9-th or 10-th of March
2) Every edit changes some symbols to others by it's code shifting. For
example (here - http://sl.wikipedia.org/wiki/?diff=2826459&oldid=2826454):
l (U+006C) -> h (U+0068)
s (U+0073) -> w (U+0077)
r (U+0072) -> v (U+0076)
And everytime the new symbol is greater or lesser than old exactly of 4.
My current version (not the version when problem happened, of cource):
Pywikipedia [http] trunk/pywikipedia (r9086, 2011/03/17, 14:20:21)
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-03-13 16:13
Message:
Do you still have the checkout that caused these problems?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208738&group_…
Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
>Summary: Unicode bug: some page titles are mangled
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysub… for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-03-16 10:39
Message:
I cannot edit details, but I have edited the summary to be a bit more
descriptive.
----------------------------------------------------------------------
Comment By: Nemo (nemobis)
Date: 2011-03-16 09:35
Message:
Thank you. Could you please make the bug subject more descriptive? Even
reading all comments I wasn't able to understand completely, and it would
be better if bot runners, who are sent to this bug by interwiki.py, could
understand what's the problem and take the necessary measures (e.g. not
using -force or -cleanup, I suppose). Thank you very much!
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-03-16 09:27
Message:
It happens for any page title where the (correct) mediawiki unicode
normalization does not equal the (incorrect) python normalization. As a
general guideline, this only happens for characters with multiple accents
(say, 3 or so) - this does not only happen for hi:, though!
I think most latin and cyrillic character sets generally are safe. For
others, I have no idea - we have had reports for several languages.
----------------------------------------------------------------------
Comment By: Nemo (nemobis)
Date: 2011-03-16 09:10
Message:
Does this bug affect other languages as well or is it safe to use
pywikipedia with this problem if you don't touch hi links?
----------------------------------------------------------------------
Comment By: Grimlock (grimlockfr)
Date: 2010-11-02 17:03
Message:
I used Python 2.7 when I discovered this bug. The bug is not fixed in 2.7
(or in all 2.7 distributions ..)
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-02 15:47
Message:
Just a quick update: upstream has confirmed this is a bug in the python
library. It should get fixed in 2.7 and 3.2, but it is not clear yet
whether 2.6.6 will have the fix included.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 17:43
Message:
Reported to the python developers: http://bugs.python.org/issue10254
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message:
C# test code: http://pastebin.ca/1977261
This does not show this regression. The C# library does not show PR29
issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmit&d…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by nemobis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
Summary: Problem with hi characters
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysub… for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
Comment By: Nemo (nemobis)
Date: 2011-03-16 09:35
Message:
Thank you. Could you please make the bug subject more descriptive? Even
reading all comments I wasn't able to understand completely, and it would
be better if bot runners, who are sent to this bug by interwiki.py, could
understand what's the problem and take the necessary measures (e.g. not
using -force or -cleanup, I suppose). Thank you very much!
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-03-16 09:27
Message:
It happens for any page title where the (correct) mediawiki unicode
normalization does not equal the (incorrect) python normalization. As a
general guideline, this only happens for characters with multiple accents
(say, 3 or so) - this does not only happen for hi:, though!
I think most latin and cyrillic character sets generally are safe. For
others, I have no idea - we have had reports for several languages.
----------------------------------------------------------------------
Comment By: Nemo (nemobis)
Date: 2011-03-16 09:10
Message:
Does this bug affect other languages as well or is it safe to use
pywikipedia with this problem if you don't touch hi links?
----------------------------------------------------------------------
Comment By: Grimlock (grimlockfr)
Date: 2010-11-02 17:03
Message:
I used Python 2.7 when I discovered this bug. The bug is not fixed in 2.7
(or in all 2.7 distributions ..)
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-02 15:47
Message:
Just a quick update: upstream has confirmed this is a bug in the python
library. It should get fixed in 2.7 and 3.2, but it is not clear yet
whether 2.6.6 will have the fix included.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 17:43
Message:
Reported to the python developers: http://bugs.python.org/issue10254
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message:
C# test code: http://pastebin.ca/1977261
This does not show this regression. The C# library does not show PR29
issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmit&d…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
Summary: Problem with hi characters
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysub… for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-03-16 09:27
Message:
It happens for any page title where the (correct) mediawiki unicode
normalization does not equal the (incorrect) python normalization. As a
general guideline, this only happens for characters with multiple accents
(say, 3 or so) - this does not only happen for hi:, though!
I think most latin and cyrillic character sets generally are safe. For
others, I have no idea - we have had reports for several languages.
----------------------------------------------------------------------
Comment By: Nemo (nemobis)
Date: 2011-03-16 09:10
Message:
Does this bug affect other languages as well or is it safe to use
pywikipedia with this problem if you don't touch hi links?
----------------------------------------------------------------------
Comment By: Grimlock (grimlockfr)
Date: 2010-11-02 17:03
Message:
I used Python 2.7 when I discovered this bug. The bug is not fixed in 2.7
(or in all 2.7 distributions ..)
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-02 15:47
Message:
Just a quick update: upstream has confirmed this is a bug in the python
library. It should get fixed in 2.7 and 3.2, but it is not clear yet
whether 2.6.6 will have the fix included.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 17:43
Message:
Reported to the python developers: http://bugs.python.org/issue10254
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message:
C# test code: http://pastebin.ca/1977261
This does not show this regression. The C# library does not show PR29
issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmit&d…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Bugs item #3081100, was opened at 2010-10-04 21:53
Message generated for change (Comment added) made by nemobis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: Wont Fix
Priority: 7
Private: No
Submitted By: Grimlock (grimlockfr)
Assigned to: xqt (xqt)
Summary: Problem with hi characters
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r8602, 2010/10/04, 19:33:48)
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = Tru
My interwiki bot on Wikipedia (using interwiki.py) can not identify correctly the interwiki link to hi, and, as a consequence, the link, which is identified as a bad one, is removed when I use -cleanup option (see here http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysub… for an example). It appears that one or more characters are misunderstood.
----------------------------------------------------------------------
Comment By: Nemo (nemobis)
Date: 2011-03-16 09:10
Message:
Does this bug affect other languages as well or is it safe to use
pywikipedia with this problem if you don't touch hi links?
----------------------------------------------------------------------
Comment By: Grimlock (grimlockfr)
Date: 2010-11-02 17:03
Message:
I used Python 2.7 when I discovered this bug. The bug is not fixed in 2.7
(or in all 2.7 distributions ..)
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-02 15:47
Message:
Just a quick update: upstream has confirmed this is a bug in the python
library. It should get fixed in 2.7 and 3.2, but it is not clear yet
whether 2.6.6 will have the fix included.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 17:43
Message:
Reported to the python developers: http://bugs.python.org/issue10254
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-30 16:52
Message:
C# test code: http://pastebin.ca/1977261
This does not show this regression. The C# library does not show PR29
issues.
I will file a bug with the python developers about this shortly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 23:16
Message:
One last comment: the problem does not appear in python < 2.6.5. Consider
using an older python version if you work on wikimedia sites.
Added warning in r8687.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-10-27 22:54
Message:
The last comments were also mine.
Mediawiki does not show problems related to PR29:
<?php
include_once('UtfNormal.php');
print bin2hex("\xe0\xad\x87\xcc\x80\xe0\xac\xbe") . "\n";
print bin2hex(UtfNormal::cleanUp("\xe0\xad\x87\xcc\x80\xe0\xac\xbe")) .
"\n";
returns the expected
e0ad87cc80e0acbe
e0ad87cc80e0acbe
where no information loss is happening. This means it might be a bug
introduced in the fix for pr29 in unicodedata.c.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:36
Message:
Probably related to
http://svn.python.org/view/python/branches/release26-maint/Modules/unicoded…
, and hence
http://bugs.python.org/issue1054943#
and
http://www.unicode.org/review/pr-29.html
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2010-10-27 22:22
Message:
Okay, this seems to be a python2.6/2.7 or mediawiki bug. It is related to
normalizing UTF-8 strings.
Check out the following:
(on py27)
Python 2.7 (r27:82500, Aug 5 2010, 04:28:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
False
(on py26):
valhallasw@willow:~/src/pywikipedia-svn$ python2.6
Python 2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize('NFC', u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917') ==
u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
True
----------------------------------------------------------------------
Comment By: tjmoel (tjmoel)
Date: 2010-10-22 23:34
Message:
Hi, my bot still make the mistakes
http://id.wikipedia.org/w/index.php?title=Archimedes&action=historysubmit&d…
Any idea on how to solve ?? Thanks
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-12 09:10
Message:
Some bots are still involved to this bug:
http://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter-Logbuch?title=Spezia…
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 21:02
Message:
Nevermind...I just noticed that you made a change to not remove hi links in
autonomous mode.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:38
Message:
I should note this morning I updated to the most recent build and have not
seen it since. And its been about 6 hours now since then. So it may have
fixed itself in the most recent build. Or I may have just been lucky and
not had any hi links gets mistaken in that time.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 20:21
Message:
Yeah look at my edits on de. I reverted a bunch of my bots changes.
http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Djsasso
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-07 18:35
Message:
Most problems came from SassoBot, MastiBot, User:ChuispastonBot,
VolkowBot, see
http://de.wikipedia.org/wiki/Wikipedia:Bots/Notizen#Interwiki-Probleme_mit_…
With actual py version deleting of hi-links is stopped. Well I'll
investigate your hint. Do you have some examples for me.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 14:26
Message:
In doing some cleanup of my bots edits on one wiki. I have seen atleast 4
other bots doing this recently. So there is clearly an issue somewhere. I
was running the new -cleanup option so maybe that is what causes it.
----------------------------------------------------------------------
Comment By: DJSasso (djsasso)
Date: 2010-10-07 12:33
Message:
It is doing it for me as well. Has been for the last few days, but seeing
as other bot seemed to fix it immediately I didn`t think it was a big issue
or was maybe my machine. So I was trying to figure it out on my own. But if
its happening to others its clearly not just my machine.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-10-05 15:17
Message:
I found this bug this morning but now it works as expected.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_…
Patches item #3210483, was opened at 2011-03-14 09:28
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3210483&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Linar Khalitov (rubin16)
>Assigned to: xqt (xqt)
Summary: commonscat.py
Initial Comment:
changing ignore template for ru.wiki
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2011-03-16 08:19
Message:
done in r9083. Thanks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3210483&group_…
Patches item #3210483, was opened at 2011-03-14 11:28
Message generated for change (Tracker Item Submitted) made by rubin16
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3210483&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Linar Khalitov (rubin16)
Assigned to: Nobody/Anonymous (nobody)
Summary: commonscat.py
Initial Comment:
changing ignore template for ru.wiki
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3210483&group_…
Bugs item #3208937, was opened at 2011-03-13 15:00
Message generated for change (Tracker Item Submitted) made by malafaya
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208937&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: André Malafaya Baptista (malafaya)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki.py using -subcats with -untranslated isn't working
Initial Comment:
When running:
interwiki.py -lang:de -family:wiktionary -async -untranslated -subcats:"Deklinierte Form"
I'm only asked for a hint for [[de:Kategorie:Deklinierte Form (Türkisch)]] but I can see there are many categories there with no interwikis:
[[Kategorie:Deklinierte Form (Tschechisch)]], [[Kategorie:Deklinierte Form (Mazedonisch)]] .....
The category above holds 33K members. The last fetching log is "Getting [[Kategorie:Deklinierte Form]] list from 21620...". My suspicion is that it may be failing to retrieve all subcats.
Thanks.
-----
Pywikipedia [http] trunk/pywikipedia (r9037, 2011/03/12, 23:12:11)
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208937&group_…
Bugs item #3208738, was opened at 2011-03-13 11:43
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208738&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
>Priority: 8
Private: No
Submitted By: Tanvir Rahman (tanvirglhs)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki.py is replacing texts
Initial Comment:
Hello, few days ago, my interwiki bot was malfunctioning 1/2 days ago. It was replacing words of the main article, what interwiki.py does not suppose to do. After I got informed about that problems, I stopped it, and re-installed Pywikipedia, and started everything over. As far as I know, that's not happening again now, but I still curious to know why was that happening in case of few edits. Since I don't know the cause, so I don't have the clue how to solve it in future.
The problematic diffs are as follows:
1. http://sl.wikipedia.org/wiki/?&diff=2826459&oldid=2826454
2. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86257583
3. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86259285
4. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86258895
5. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86259821
Here is the copy of my current version.py. The actual Pywikipedia version was a few revisions earlier.
Pywikipedia [http] trunk/pywikipedia (r9042, 2011/03/13, 10:14:47)
Python 2.7.1 (r271:86832, Jan 4 2011, 13:57:14)
[GCC 4.5.2]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208738&group_…
Bugs item #3208738, was opened at 2011-03-13 16:43
Message generated for change (Tracker Item Submitted) made by tanvirglhs
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208738&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tanvir Rahman (tanvirglhs)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki.py is replacing texts
Initial Comment:
Hello, few days ago, my interwiki bot was malfunctioning 1/2 days ago. It was replacing words of the main article, what interwiki.py does not suppose to do. After I got informed about that problems, I stopped it, and re-installed Pywikipedia, and started everything over. As far as I know, that's not happening again now, but I still curious to know why was that happening in case of few edits. Since I don't know the cause, so I don't have the clue how to solve it in future.
The problematic diffs are as follows:
1. http://sl.wikipedia.org/wiki/?&diff=2826459&oldid=2826454
2. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86257583
3. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86259285
4. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86258895
5. http://de.wikipedia.org/wiki/?&diff=prev&oldid=86259821
Here is the copy of my current version.py. The actual Pywikipedia version was a few revisions earlier.
Pywikipedia [http] trunk/pywikipedia (r9042, 2011/03/13, 10:14:47)
Python 2.7.1 (r271:86832, Jan 4 2011, 13:57:14)
[GCC 4.5.2]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3208738&group_…