Patches item #2790445, was opened at 2009-05-12 00:30
Message generated for change (Comment added) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Re 1843798: Add capabiliy to remember pages to replace.py
Initial Comment:
A new patch to implement toobaz's function with the changes suggested by wikipedian.
https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati…
- solve_disambiguation.py and pagegenerators.py:
1. Generator and logging function for -primary option moved
from solve_disambiguation.py to pagegenerators.py
2. TODO in solve_disambiguation.py done:
generator now starts yielding before all referring pages have been found
3. makes use of new TextfilePageGenerator
4. code is a few lines shorter
- replace.py:
5. "-exclude" option from toobaz's patch implemented.
Allows to filter generator through a list of previously edited pages.
New pages are appended to the filter file based on choices made:
-exclude: logs to filter choice "N"
6. additional command line options for other settings:
-editonce: logs to filter choices "Y", "A"
-treatonce: logs to filter choices "Y", "A", "N"
-scanonce: logs to filter choices "Y", "A", "N"; no change
7. uses generator and file format from solve_disambiguation.py
(suggested by wikipedian below)
8. default filter filename is the name of the fix. Files are placed
in a subdirectory "replace".
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-12 09:25
Message:
Thanks for the quick review. I will try to address the
various points and included a new version of the patch.
a. I added a bit more text to the source and reformatted
part of the code, but I didn't want to change existing
code more than needed.
b. generator:
- checks if the filter file exists
- reads it
- runs the next generator and skips pages in memory
Previously, it first run the next generator and then deleted
from its result pages that were in the filter file
c. replace.py command line options
I added several command line options to define which
pages should be skipped the next time. One could edit
replace.py directly, but it seemed cleaner to provide
all options at command line level.
toobaz excluded pages where a replacement was manually
rejected ("N"). The option "-exclude" will keep this
functionality.
Personally, I find it more useful to filter pages that
were edited in a previous run. This avoids that the bot
repeats the same edit later, after someone reverted
a previous edit. Option "-editonce" provides this.
"-treatonce" combines the two.
"-scanonce" avoids that the bot re-fetches the same page
in a 2nd run, even if the regex didn't match it in
the first run. (I fixed an omission for "skipped" in
the second patch)
Without the different options, the additions to replace.py
would be much shorter ..
d. I had to insert several "break" in replace.py to avoid
that nothing but "N" gets to the stage confusingly labeled
"choice must be 'N'" in the code.
e. FilterFileAppend is based on the function from
solve_disambiguation. The advantage of writing each
page to the file is that it wont miss one if it's
interrupted or crashes. This mode from
solve_disambiguation remains unchanged.
f. The same goes for the file format. Up to now, I didn't
have any problems with it and it worked ok with a
title "臺灣Taiwan&āàäà" I just tested. urlname was also
used by PrimaryIgnoreManager. For backward compatibility,
may it should be kept.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 06:47
Message:
Wow, that's a big patch =)
* codecs is fine with me
* can you avoid lines > 80 characters? I know that this is not something
we do everywhere, but that's bad looking code. Same goes for if foo: bar.
Please skip a line.
* can you document thoroughly what's being done? parameters in the
generators? In replace.py ? I find it really hard to understand the
"choice" table in the docstring explaining -scanonce & others.
* What's this:
+ f = codecs.open(filename, 'r', 'utf-8')
+ f.close()
??
I am also not convinced by the fact that after each page, FilterFileAppend
is called, and #1 path is computed, #2 a file is opened, written in, and
closed.
I'm thinking that a possible cleaner way to do this would be to have a
Filter object: put everything you need in it (an opened file descriptor, a
list of titles to ignore if you need to use this, etc...) and keep a
reference to it from the replace & disambig bots. How does that sound to
you?
I also know that Daniel wanted first to keep the same file format, but...
a couple of things are wrong here:
* if you output titles with page.urlname() it will not be possible to read
the file with TextfilePageGenerator afaik. Think of special characters,
being url encoded, and not decoded.
* if you want to use a Page title for a filename, you want
Page.titleforFilename, not Page.urlname
Thank you!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Patches item #2790445, was opened at 2009-05-12 06:30
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Re 1843798: Add capabiliy to remember pages to replace.py
Initial Comment:
A new patch to implement toobaz's function with the changes suggested by wikipedian.
https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati…
- solve_disambiguation.py and pagegenerators.py:
1. Generator and logging function for -primary option moved
from solve_disambiguation.py to pagegenerators.py
2. TODO in solve_disambiguation.py done:
generator now starts yielding before all referring pages have been found
3. makes use of new TextfilePageGenerator
4. code is a few lines shorter
- replace.py:
5. "-exclude" option from toobaz's patch implemented.
Allows to filter generator through a list of previously edited pages.
New pages are appended to the filter file based on choices made:
-exclude: logs to filter choice "N"
6. additional command line options for other settings:
-editonce: logs to filter choices "Y", "A"
-treatonce: logs to filter choices "Y", "A", "N"
-scanonce: logs to filter choices "Y", "A", "N"; no change
7. uses generator and file format from solve_disambiguation.py
(suggested by wikipedian below)
8. default filter filename is the name of the fix. Files are placed
in a subdirectory "replace".
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 12:47
Message:
Wow, that's a big patch =)
* codecs is fine with me
* can you avoid lines > 80 characters? I know that this is not something
we do everywhere, but that's bad looking code. Same goes for if foo: bar.
Please skip a line.
* can you document thoroughly what's being done? parameters in the
generators? In replace.py ? I find it really hard to understand the
"choice" table in the docstring explaining -scanonce & others.
* What's this:
+ f = codecs.open(filename, 'r', 'utf-8')
+ f.close()
??
I am also not convinced by the fact that after each page, FilterFileAppend
is called, and #1 path is computed, #2 a file is opened, written in, and
closed.
I'm thinking that a possible cleaner way to do this would be to have a
Filter object: put everything you need in it (an opened file descriptor, a
list of titles to ignore if you need to use this, etc...) and keep a
reference to it from the replace & disambig bots. How does that sound to
you?
I also know that Daniel wanted first to keep the same file format, but...
a couple of things are wrong here:
* if you output titles with page.urlname() it will not be possible to read
the file with TextfilePageGenerator afaik. Think of special characters,
being url encoded, and not decoded.
* if you want to use a Page title for a filename, you want
Page.titleforFilename, not Page.urlname
Thank you!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Bugs item #2789460, was opened at 2009-05-09 18:34
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2789460&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: xqt (xqt)
Assigned to: Nobody/Anonymous (nobody)
Summary: fi-wiki crossing namespace
Initial Comment:
Please give the ability for crossing namespace to fi-wiki. The reason is on fi-wiki all wikipedia sites are seated on the wikipedia: namespace on all other project they didn't. See the iw-links on http://en.wikipedia.org/w/index.php?title=English_Wikipedia&action=edit&sec… for example.
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 12:10
Message:
done, in r6877
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-05-09 18:36
Message:
Sorry, it must be fi-wiki
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2789460&group_…
Bugs item #2788226, was opened at 2009-05-07 08:22
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2788226&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: JAn (jandudik)
Assigned to: Nobody/Anonymous (nobody)
Summary: unknown disambiguations
Initial Comment:
I found in cs.wiki dismabiguations with template {{acrònim}}
This one is mentioned in wikipedia-family.py, but bot doesn't recogbnize it as disambiguation (e.g. http://ca.wikipedia.org/wiki/TK )
Next problem with unrecognised disambiguation is in zh(e.g. http://zh.wikipedia.org/wiki/TI ), where is used construction
{{disambig||{{#if:拉丁字母缩写消歧义|Cat=拉丁字母缩写消歧义 }}|Help= }}
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 11:56
Message:
Great bug report.
* The first bug was a case problem when comparing titles. Fixed in r6874
* Second one was a problem with parser functions when matching templates.
Fixed in r6875
* And I added in r6876 the russian template. I dont speak Russian, but
looking at the IW links, it is {{surname}}.
Thanks =)
----------------------------------------------------------------------
Comment By: JAn (jandudik)
Date: 2009-05-07 08:24
Message:
Next one, but I am not sure if this may be disambiguation:
ru: {{Фамилия|Немецкие}} used in
http://ru.wikipedia.org/wiki/Лихтенберг_(значения)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2788226&group_…
Patches item #2786487, was opened at 2009-05-04 11:13
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2786487&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Translations
Group: None
Status: Closed
Resolution: Duplicate
Priority: 5
Private: No
Submitted By: xqt (xqt)
Assigned to: Nobody/Anonymous (nobody)
Summary: Some translations for cosmetic_changes.py
Initial Comment:
Here are some additional translations for cosmetic_changes.py
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 11:10
Message:
This artifact has been marked as a duplicate of artifact 2787137 with
reason:
No explanation provided.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-05-05 11:59
Message:
There is a new version on 2787137. Sorry I didn't found a upload button for
the old one.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2786487&group_…
Bugs item #2786042, was opened at 2009-05-03 13:43
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2786042&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Pending
Resolution: None
Priority: 5
Private: No
Submitted By: xqt (xqt)
Assigned to: Nobody/Anonymous (nobody)
Summary: family.py: Wrong File: translation on wo-wiki
Initial Comment:
__version__='$Id: family.py 6725 2009-04-26 08:31:00Z nicdumz $'
There is a wrong Image: or File: translation on wo-wiki (see message at http://wo.wikipedia.org/wiki/Waxtaani_j%C3%ABfandikukat:Xqt and http://wo.wikipedia.org/w/index.php?title=Saytubiddiw&diff=prev&oldid=35829 as an example). Maybe it comes from this NS 7 translation
'wo': [u'Waxtaani dencukaay', u'Dencukaay'],
but it should be
'wo': [u'Waxtaani dencukaay'],
only.
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 11:09
Message:
I'm not sure of what happened. I changed the NS 7 localization in r6873.
Can you comment back on that issue if it happens again after this? Thanks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2786042&group_…
Feature Requests item #2789464, was opened at 2009-05-09 12:49
Message generated for change (Comment added) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2789464&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: xqt (xqt)
Assigned to: Nobody/Anonymous (nobody)
Summary: wikipedia.replaceCategoryLinks
Initial Comment:
replaceCategoryLinks should not split {{link FA}} or {{link GA}} and interwiki links. Normaly this routine would place categories before iw links. But these two shouldn't as I got some entries on my de: talk page according et- and ru-wiki (see http://de.wikipedia.org/wiki/Benutzer_Diskussion:Xqt#moving_Link_templates_…) for expl.
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-12 00:54
Message:
add_text.py has a feature that might solve this (starsListInPage).
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2009-05-09 14:45
Message:
It's a serious problem, causing a lot of trouble and critics by users who
are angry that the bot "destroys" their articles. Obersachse
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2789464&group_…
Patches item #1843798, was opened at 2007-12-03 21:45
Message generated for change (Comment added) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1843798&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pietro Battiston (toobaz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add capabiliy to remember pages to replace.py
Initial Comment:
When doing very long semi-automatic replacements, it can happen to kill the bot and to start again. So you have to say "no" again to all non wanted replacements. It is even worse if you're using an xml dump: it can be several weeks old, and it will make you download lot of pages that where ALREADY corrected.
This patch consist in two parts:
1) a patch to replace.py that adds a new parameter, "-exclude", and makes it accept a path to a file which will be used both for:
-> knowing which articles to exclude from substitution
-> logging denied replaces' pages and pages already known to be not needing replacements
2) a patch to pagegenerators.py that adds a generator filter, able to yield only pages not appearing in a given list
The only doubt I have is: should the replace.py log in some other way? xml? wikipedia module's predefined functions? log into a given wikipedia userpage (so that logs can easily be shared)?
As I've done it, it needs to import os and codecs modules... don't know if it's a problem.
Anyway, a patch like this is something really needed, if needed I can try to improve it.
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-12 00:32
Message:
see patch ID: 2790445
https://sourceforge.net/tracker/?func=detail&aid=2790445&group_id=93107&ati…
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2008-06-23 16:22
Message:
Logged In: NO
closed this patch?
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2008-01-16 09:20
Message:
Logged In: NO
replace.py already has the option -xmlstart:page when using an xml dump,
to skip all entries before "page".
----------------------------------------------------------------------
Comment By: Daniel Herding (wikipedian)
Date: 2008-01-16 07:35
Message:
Logged In: YES
user_id=880694
Originator: NO
We already have something very similar for solve_disambiguation.py. When
you run it with the -primary parameter, e.g. on [[en:London]], it saves all
page titles where the user pressed 'N' to the 'disambiguations' directory,
and skips these pages when you run the same command later.
It saves the URL-encoded titles into a text files, one title per line,
without [[brackets]].
It would be nice if some code could be shared, although I'm not sure if
that's possible (I haven't yet looked at your code, but
solve_disambiguation.py is a bit complicated). But we should keep
solve_disambiguation's format because there are probably people who want to
keep using their logs.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1843798&group_…
Patches item #2790445, was opened at 2009-05-12 00:30
Message generated for change (Settings changed) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Re 1843798: Add capabiliy to remember pages to replace.py
Initial Comment:
A new patch to implement toobaz's function with the changes suggested by wikipedian.
https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati…
- solve_disambiguation.py and pagegenerators.py:
1. Generator and logging function for -primary option moved
from solve_disambiguation.py to pagegenerators.py
2. TODO in solve_disambiguation.py done:
generator now starts yielding before all referring pages have been found
3. makes use of new TextfilePageGenerator
4. code is a few lines shorter
- replace.py:
5. "-exclude" option from toobaz's patch implemented.
Allows to filter generator through a list of previously edited pages.
New pages are appended to the filter file based on choices made:
-exclude: logs to filter choice "N"
6. additional command line options for other settings:
-editonce: logs to filter choices "Y", "A"
-treatonce: logs to filter choices "Y", "A", "N"
-scanonce: logs to filter choices "Y", "A", "N"; no change
7. uses generator and file format from solve_disambiguation.py
(suggested by wikipedian below)
8. default filter filename is the name of the fix. Files are placed
in a subdirectory "replace".
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Patches item #2790445, was opened at 2009-05-12 00:30
Message generated for change (Tracker Item Submitted) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Re 1843798: Add capabiliy to remember pages to replace.py
Initial Comment:
A new patch to implement toobaz's function with the changes suggested by wikipedian.
https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati…
- solve_disambiguation.py and pagegenerators.py:
1. Generator and logging function for -primary option moved
from solve_disambiguation.py to pagegenerators.py
2. TODO in solve_disambiguation.py done:
generator now starts yielding before all referring pages have been found
3. makes use of new TextfilePageGenerator
4. code is a few lines shorter
- replace.py:
5. "-exclude" option from toobaz's patch implemented.
Allows to filter generator through a list of previously edited pages.
New pages are appended to the filter file based on choices made:
-exclude: logs to filter choice "N"
6. additional command line options for other settings:
-editonce: logs to filter choices "Y", "A"
-treatonce: logs to filter choices "Y", "A", "N"
-scanonce: logs to filter choices "Y", "A", "N"; no change
7. uses generator and file format from solve_disambiguation.py
(suggested by wikipedian below)
8. default filter filename is the name of the fix. Files are placed
in a subdirectory "replace".
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…