Bugs item #3539444, was opened at 2012-07-02 06:25
Message generated for change (Comment added) made by eranroz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539444&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: replace doesn't support optional groups
Initial Comment:
textlib.py (method replaceExcept) doesn't support optional capturing groups in regex.
I tried to run replace.py with the following regex: "RISHMI(T |IM)?" => "RISHMI\1"
when running it on a page containing the following text "SOMETHING RISHMI SOMETHING"
it crashes with the following error:
textlib.py, line 178, in replaceExcept
match.group(groupID) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found
line 178 contains the statement:
replacement = replacement[:groupMatch.start()] + \
match.group(groupID) + \
replacement[groupMatch.end():]
textlib.py should check for match.group(groupID) ==None and if so, add here empty string instead of match.group(groupID)
----------------------------------------------------------------------
Comment By: Eranroz (eranroz)
Date: 2012-07-04 05:03
Message:
This regex here is just an example, and probably a bad one (as the regex it
does nothing by this replacement). Your suggestion regarding the specific
regex (to use inner optional group within group) would probably fix this
specific regex, but this is workaround - replace.py should support
replacing capturing optional capturing group the same way re.findall
behaves.
The behaviour of replacing None to empty string is compatible with the
behaviour of re.findall (re.findall('a(b)?(c)','ac') => [('', 'c')]) and
with regex engines of most languages (in JS:
'ac'.replace(/a(b)?(c)/,'a$1c')), though python re isn't consistent here
(re.sub('a(b)?(c)','X\\1','ac') - is error).
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 22:20
Message:
The group must exist to reuse it. What should this regex do in your
opinion. What about RISHMI(T |IM|)" or RISHM((?:T |IM)?)"? Errors should
never pass silently unless explicitly silenced (PEP 20). Maybe replacing
empty strings could lead to unwanted side effects but I have'nt thought
about it.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539444&group_…
Patches item #3539859, was opened at 2012-07-03 11:35
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eranroz (eranroz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bugfix for optional caputring group
Initial Comment:
Patch for pywikibot/textlib.py for the replace function (replaceExcept) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group (eg AAA in this regex "bla(AAA)?bla" )
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-04 04:02
Message:
See my comment at the corresponding bug tracker. Maybe it would be ok to
accept this patch, anyway I've asked for a third opinion in this matter.
----------------------------------------------------------------------
Comment By: Eranroz (eranroz)
Date: 2012-07-03 23:44
Message:
Yea, this is bugfix for 3539444 .
In short:
when running the following regex "ADMA (a)?poria" => "ADMA \1porya"
on text containing ADMA poria (with no a before poria) it crashs with the
following error
doReplacements
res = replace.ReplaceRobot.doReplacements(self,original_text)
File "D:\myBot\python\pywikipedia-nightly\replace.py", line 390, in
doReplacements
allowoverlap=self.allowoverlap)
File "D:\myBot\python\pywikipedia-nightly\pywikibot\textlib.py", line
179, in replaceExcept
match.group(groupID) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found
You may suggest to rewrite the specific regex and it may probably work, but
it is just workaround - regex with optional capturing group is correct and
should work properly.
Longer story :) :
In Hebrew Wikipedia there is a list of regexs that are used for
replacements in all articles (almost). which is here:
http://he.wikipedia.org/wiki/%D7%95%D7%A7:%D7%A8%D7%94
The columns in the table there are:
ID | old | new | exceptText
The list is used by C# bot implementation which isn't active, and by JS
userscript implementation which is used for specific page replacements.
I have ported it to work with replace.py, but if fails when it gets to
replacement with optional capturing group. After my fix (locally) I ran it
for 250 test edits and it worked properly without crashes
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 21:48
Message:
Is this path for bug #3539444?
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 21:26
Message:
I don't understand this bug. What is the traceback before this patch is
implemented. And what should that replaceexcept() do in your special case
Could you give me a full example. You may exclude this group by
"bla(?:AAA)?bla"; would this help?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Bugs item #3473828, was opened at 2012-01-14 09:37
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3473828&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: Accepted
>Priority: 7
Private: No
Submitted By: GanZ (ganz-ru)
Assigned to: Nobody/Anonymous (nobody)
Summary: cosmetic_changes.py and <code>
Initial Comment:
Here:
http://ru.wikipedia.org/w/index.php?diff=40245627&oldid=39731202http://ru.wikipedia.org/w/index.php?diff=40765267&oldid=39425330
cosmetic_changes.py replaced the internal internal html-text of <code>...</code> with unicode symbols. That's not good, since the tag <code> is for an original representation of codes, including the html-code. So, I propose to add this tag to the elist of exceptions.
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-04 03:32
Message:
CosmeticChangesToolkit.resolveHtmlEntities() deactivated in pyrev:10438 due
to this bug
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-05-30 05:42
Message:
duplicated by bug #3530791
raising prio.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-03-17 18:06
Message:
@ganz-du: unfortunatelly this part of code does not use replaceExcept() and
we have no exceptions list where to add that tag.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-17 06:34
Message:
Feel free to submit a patch.
----------------------------------------------------------------------
Comment By: GanZ (ganz-ru)
Date: 2012-03-17 04:00
Message:
As I see, it just necessary to add this tag into arrays of exceptions and
it does not requir editing of any other parts of code. So please accelerate
the processing of of this bug.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3473828&group_…
Patches item #3539859, was opened at 2012-07-03 11:35
Message generated for change (Comment added) made by eranroz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eranroz (eranroz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bugfix for optional caputring group
Initial Comment:
Patch for pywikibot/textlib.py for the replace function (replaceExcept) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group (eg AAA in this regex "bla(AAA)?bla" )
----------------------------------------------------------------------
>Comment By: Eranroz (eranroz)
Date: 2012-07-03 23:44
Message:
Yea, this is bugfix for 3539444 .
In short:
when running the following regex "ADMA (a)?poria" => "ADMA \1porya"
on text containing ADMA poria (with no a before poria) it crashs with the
following error
doReplacements
res = replace.ReplaceRobot.doReplacements(self,original_text)
File "D:\myBot\python\pywikipedia-nightly\replace.py", line 390, in
doReplacements
allowoverlap=self.allowoverlap)
File "D:\myBot\python\pywikipedia-nightly\pywikibot\textlib.py", line
179, in replaceExcept
match.group(groupID) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found
You may suggest to rewrite the specific regex and it may probably work, but
it is just workaround - regex with optional capturing group is correct and
should work properly.
Longer story :) :
In Hebrew Wikipedia there is a list of regexs that are used for
replacements in all articles (almost). which is here:
http://he.wikipedia.org/wiki/%D7%95%D7%A7:%D7%A8%D7%94
The columns in the table there are:
ID | old | new | exceptText
The list is used by C# bot implementation which isn't active, and by JS
userscript implementation which is used for specific page replacements.
I have ported it to work with replace.py, but if fails when it gets to
replacement with optional capturing group. After my fix (locally) I ran it
for 250 test edits and it worked properly without crashes
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 21:48
Message:
Is this path for bug #3539444?
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 21:26
Message:
I don't understand this bug. What is the traceback before this patch is
implemented. And what should that replaceexcept() do in your special case
Could you give me a full example. You may exclude this group by
"bla(?:AAA)?bla"; would this help?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Feature Requests item #3528379, was opened at 2012-05-20 04:31
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=3528379&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: ToAruShiroiNeko ()
Assigned to: xqt (xqt)
Summary: redirect.py logging of problems that cannot be fixed
Initial Comment:
redirect.py needs to log issues it is unable to fix and why on each wiki. There are several flavors of problems that appears on Special:Doubleredirects
1. Self redirects (redirects that point to themselves)
2. Redirect loops (redirects that go in circles)
3. Double redirects formed due to page protection.
4. Inter-wiki redirects (redirects that point to redirects in other wikis)
It would be a lot easier if I had a log of these pages and user can post it on the village pump or perhaps bot can do this monthly for a select number of wikis. The code already provides a warning on the console but when you are running it on 700 wikis like me that becomes a serious chore to follow.
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-03 22:40
Message:
If there is nobody who can deal with these special pages, there is no
reason to post it again on the village pumb. I guess a better way is to
solve the remaining problems if possible. This means for redirect loops and
self links:
- check whether there is any possibility to solve the redirect link to a
new page
- otherwise tag it for speedy deletion.
I worked on that and the code is ready and I did som test edits in past. I
guess I'll commit it in autumn into rewrite.
example of the current working copy working:
>>> USB-on-the-go <<<
Links to: [[USB-on-the-go]].
Warning: Redirect target [[USB-on-the-go]] forms a redirect loop.
NOTE: Searching for USB-on-the-go
1 - ratio 0.692308 1 USB On-The-Go
20 - ratio 0.222222 90 Mobile operating system
18 - ratio 0.181818 99 Universal Serial Bus
10 - ratio 0.300000 33 USB 3.0
13 - ratio 0.285714 45 Live USB
17 - ratio 0.228571 74 USB Implementers Forum
10 - ratio 0.153846 65 Windows To Go
11 - ratio 0.357143 30 USB flash drive
19 - ratio 0.176471 107 Handheld game console
21 - ratio 0.157895 133 Features new to Windows 8
1 (1) USB On-The-Go
[[en:USB-on-the-go]] may lead to [[en:USB On-The-Go]]
----------------------------------------------------------------------
Comment By: ToAruShiroiNeko ()
Date: 2012-07-02 06:47
Message:
When you are plowing through 700 wikis even simple tasks become difficult.
A compiled report would let me know which wikis to notify that human
intervention is necessarily on which pages which can denote the type of
intervention necessary. The prepared report could be language specific so
es.Wikipedia would get a report in Spanish, de.Wikipedia would get a report
in German, etc.
Protected redirects are a problem particularly as they appear like stuff
bots can fix but they can't because bots are unable to edit protected
pages. This is not one of your examples. Special:Doubleredirects makes no
distinction for this type of problem.
Redirect loops may be more than 2 pages. Among 200 entries such a thing
could be difficult to spot.
Also while how to deal with redirect loops is obvious to you and me, admins
in local communities are often more than uneasy in dealing with this issue
they are not familiar with.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-01 20:38
Message:
There are several bot who work on double redirects several time a day. The
remaining items might be redirect loops or self redirects. e.g.
Foo --> Foo --> Foo
are always self loops
Foo --> Bar --> Foo
Bar --> Foo --> Bar
are always redirect loops
is it difficult?
----------------------------------------------------------------------
Comment By: ToAruShiroiNeko ()
Date: 2012-07-01 11:34
Message:
It isn't easy to distinguish they just appear like redirects bot can fix. I
want to have the option in the code to log that and post this to the
village pump for local communities attention.
It is very difficult for me to do that by hand on 700 wikis of which most
don't even need my attention.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-01 06:31
Message:
They remain in Special:DoubleRedirects and must be fixed by hand or deleted
by admins. It is easy to distinguish between multiple redirects and
redirect loops and there is no need to explain it outside.
----------------------------------------------------------------------
Comment By: ToAruShiroiNeko ()
Date: 2012-06-30 17:20
Message:
How is keeping track of problems bots are unable to fix a duplication of
Special:DoubleRedirect's?
You asked me to file this bug request after I explained the problem I was
having to you.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-06-18 04:24
Message:
rejected. I do not see any sense for a list duplication of
Special:DoubleRedirects
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-05-20 06:36
Message:
1. - 3. are all listed by Special:DoubleRedirects. They are remaining after
the redirect bot cannot solve the problem.
4. Interwiki redirects normally are fixed be interwiki bots except there is
a __STATICREDIRECT__ in the redirect page.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=3528379&group_…
Bugs item #3539444, was opened at 2012-07-02 06:25
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539444&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: replace doesn't support optional groups
Initial Comment:
textlib.py (method replaceExcept) doesn't support optional capturing groups in regex.
I tried to run replace.py with the following regex: "RISHMI(T |IM)?" => "RISHMI\1"
when running it on a page containing the following text "SOMETHING RISHMI SOMETHING"
it crashes with the following error:
textlib.py, line 178, in replaceExcept
match.group(groupID) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found
line 178 contains the statement:
replacement = replacement[:groupMatch.start()] + \
match.group(groupID) + \
replacement[groupMatch.end():]
textlib.py should check for match.group(groupID) ==None and if so, add here empty string instead of match.group(groupID)
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-03 22:20
Message:
The group must exist to reuse it. What should this regex do in your
opinion. What about RISHMI(T |IM|)" or RISHM((?:T |IM)?)"? Errors should
never pass silently unless explicitly silenced (PEP 20). Maybe replacing
empty strings could lead to unwanted side effects but I have'nt thought
about it.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539444&group_…
Patches item #3539859, was opened at 2012-07-03 11:35
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eranroz (eranroz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bugfix for optional caputring group
Initial Comment:
Patch for pywikibot/textlib.py for the replace function (replaceExcept) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group (eg AAA in this regex "bla(AAA)?bla" )
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-03 21:48
Message:
Is this path for bug #3539444?
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 21:26
Message:
I don't understand this bug. What is the traceback before this patch is
implemented. And what should that replaceexcept() do in your special case
Could you give me a full example. You may exclude this group by
"bla(?:AAA)?bla"; would this help?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Bugs item #3539407, was opened at 2012-07-02 03:01
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539407&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: cosmetic changes
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: reza (reza1615)
Assigned to: Nobody/Anonymous (nobody)
Summary: cosmetic_changes bug on citation's number and punctuation
Initial Comment:
class fixArabicLetters() changes Latin citation's number and punctuation (,) to Persian number and punctuation (،) and it is not correct please set it if the text around the number is in Latin do not convert numbers.
http://fa.wikipedia.org/w/index.php?title=%D8%A7%D8%B1%DB%8C%DA%A9_%D8%AA%D…
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-03 21:46
Message:
Is there any regularity for these citations e.g. "\(<en-fullmonthname>
\d{2}, \d{4}\)"?
----------------------------------------------------------------------
Comment By: reza (reza1615)
Date: 2012-07-02 03:13
Message:
in fa.wiki we have gadget that works fine it has function (digits () )
that convert numbers correctly my be it will useful for solving this bug
http://fa.wikipedia.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%…
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539407&group_…
Patches item #3539859, was opened at 2012-07-03 11:35
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eranroz (eranroz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bugfix for optional caputring group
Initial Comment:
Patch for pywikibot/textlib.py for the replace function (replaceExcept) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group (eg AAA in this regex "bla(AAA)?bla" )
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2012-07-03 21:26
Message:
I don't understand this bug. What is the traceback before this patch is
implemented. And what should that replaceexcept() do in your special case
Could you give me a full example. You may exclude this group by
"bla(?:AAA)?bla"; would this help?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Patches item #3539859, was opened at 2012-07-03 11:35
Message generated for change (Tracker Item Submitted) made by eranroz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eranroz (eranroz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bugfix for optional caputring group
Initial Comment:
Patch for pywikibot/textlib.py for the replace function (replaceExcept) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group (eg AAA in this regex "bla(AAA)?bla" )
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3539859&group_…