Pywikipedia-bugs

pywikipedia-bugs@lists.wikimedia.org

8466 discussions

[ pywikipediabot-Patches-2790445 ] Re 1843798: Add capabiliy to remember pages to replace.py
by SourceForge.net 12 May '09

12 May '09

Patches item #2790445, was opened at 2009-05-12 00:30 Message generated for change (Comment added) made by sigmaoctantis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: sigmaoctantis (sigmaoctantis) Assigned to: Nobody/Anonymous (nobody) Summary: Re 1843798: Add capabiliy to remember pages to replace.py Initial Comment: A new patch to implement toobaz's function with the changes suggested by wikipedian. https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati… - solve_disambiguation.py and pagegenerators.py: 1. Generator and logging function for -primary option moved from solve_disambiguation.py to pagegenerators.py 2. TODO in solve_disambiguation.py done: generator now starts yielding before all referring pages have been found 3. makes use of new TextfilePageGenerator 4. code is a few lines shorter - replace.py: 5. "-exclude" option from toobaz's patch implemented. Allows to filter generator through a list of previously edited pages. New pages are appended to the filter file based on choices made: -exclude: logs to filter choice "N" 6. additional command line options for other settings: -editonce: logs to filter choices "Y", "A" -treatonce: logs to filter choices "Y", "A", "N" -scanonce: logs to filter choices "Y", "A", "N"; no change 7. uses generator and file format from solve_disambiguation.py (suggested by wikipedian below) 8. default filter filename is the name of the fix. Files are placed in a subdirectory "replace". ---------------------------------------------------------------------- Comment By: sigmaoctantis (sigmaoctantis) Date: 2009-05-12 09:25 Message: Thanks for the quick review. I will try to address the various points and included a new version of the patch. a. I added a bit more text to the source and reformatted part of the code, but I didn't want to change existing code more than needed. b. generator: - checks if the filter file exists - reads it - runs the next generator and skips pages in memory Previously, it first run the next generator and then deleted from its result pages that were in the filter file c. replace.py command line options I added several command line options to define which pages should be skipped the next time. One could edit replace.py directly, but it seemed cleaner to provide all options at command line level. toobaz excluded pages where a replacement was manually rejected ("N"). The option "-exclude" will keep this functionality. Personally, I find it more useful to filter pages that were edited in a previous run. This avoids that the bot repeats the same edit later, after someone reverted a previous edit. Option "-editonce" provides this. "-treatonce" combines the two. "-scanonce" avoids that the bot re-fetches the same page in a 2nd run, even if the regex didn't match it in the first run. (I fixed an omission for "skipped" in the second patch) Without the different options, the additions to replace.py would be much shorter .. d. I had to insert several "break" in replace.py to avoid that nothing but "N" gets to the stage confusingly labeled "choice must be 'N'" in the code. e. FilterFileAppend is based on the function from solve_disambiguation. The advantage of writing each page to the file is that it wont miss one if it's interrupted or crashes. This mode from solve_disambiguation remains unchanged. f. The same goes for the file format. Up to now, I didn't have any problems with it and it worked ok with a title "臺灣Taiwan&āàäà" I just tested. urlname was also used by PrimaryIgnoreManager. For backward compatibility, may it should be kept. ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-05-12 06:47 Message: Wow, that's a big patch =) * codecs is fine with me * can you avoid lines > 80 characters? I know that this is not something we do everywhere, but that's bad looking code. Same goes for if foo: bar. Please skip a line. * can you document thoroughly what's being done? parameters in the generators? In replace.py ? I find it really hard to understand the "choice" table in the docstring explaining -scanonce & others. * What's this: + f = codecs.open(filename, 'r', 'utf-8') + f.close() ?? I am also not convinced by the fact that after each page, FilterFileAppend is called, and #1 path is computed, #2 a file is opened, written in, and closed. I'm thinking that a possible cleaner way to do this would be to have a Filter object: put everything you need in it (an opened file descriptor, a list of titles to ignore if you need to use this, etc...) and keep a reference to it from the replace & disambig bots. How does that sound to you? I also know that Daniel wanted first to keep the same file format, but... a couple of things are wrong here: * if you output titles with page.urlname() it will not be possible to read the file with TextfilePageGenerator afaik. Think of special characters, being url encoded, and not decoded. * if you want to use a Page title for a filename, you want Page.titleforFilename, not Page.urlname Thank you! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…

1 0

[ pywikipediabot-Patches-2790445 ] Re 1843798: Add capabiliy to remember pages to replace.py
by SourceForge.net 12 May '09

12 May '09

Patches item #2790445, was opened at 2009-05-12 06:30 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: sigmaoctantis (sigmaoctantis) Assigned to: Nobody/Anonymous (nobody) Summary: Re 1843798: Add capabiliy to remember pages to replace.py Initial Comment: A new patch to implement toobaz's function with the changes suggested by wikipedian. https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati… - solve_disambiguation.py and pagegenerators.py: 1. Generator and logging function for -primary option moved from solve_disambiguation.py to pagegenerators.py 2. TODO in solve_disambiguation.py done: generator now starts yielding before all referring pages have been found 3. makes use of new TextfilePageGenerator 4. code is a few lines shorter - replace.py: 5. "-exclude" option from toobaz's patch implemented. Allows to filter generator through a list of previously edited pages. New pages are appended to the filter file based on choices made: -exclude: logs to filter choice "N" 6. additional command line options for other settings: -editonce: logs to filter choices "Y", "A" -treatonce: logs to filter choices "Y", "A", "N" -scanonce: logs to filter choices "Y", "A", "N"; no change 7. uses generator and file format from solve_disambiguation.py (suggested by wikipedian below) 8. default filter filename is the name of the fix. Files are placed in a subdirectory "replace". ---------------------------------------------------------------------- >Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-05-12 12:47 Message: Wow, that's a big patch =) * codecs is fine with me * can you avoid lines > 80 characters? I know that this is not something we do everywhere, but that's bad looking code. Same goes for if foo: bar. Please skip a line. * can you document thoroughly what's being done? parameters in the generators? In replace.py ? I find it really hard to understand the "choice" table in the docstring explaining -scanonce & others. * What's this: + f = codecs.open(filename, 'r', 'utf-8') + f.close() ?? I am also not convinced by the fact that after each page, FilterFileAppend is called, and #1 path is computed, #2 a file is opened, written in, and closed. I'm thinking that a possible cleaner way to do this would be to have a Filter object: put everything you need in it (an opened file descriptor, a list of titles to ignore if you need to use this, etc...) and keep a reference to it from the replace & disambig bots. How does that sound to you? I also know that Daniel wanted first to keep the same file format, but... a couple of things are wrong here: * if you output titles with page.urlname() it will not be possible to read the file with TextfilePageGenerator afaik. Think of special characters, being url encoded, and not decoded. * if you want to use a Page title for a filename, you want Page.titleforFilename, not Page.urlname Thank you! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…

1 0

[ pywikipediabot-Bugs-2789460 ] fi-wiki crossing namespace
by SourceForge.net 12 May '09

12 May '09

Bugs item #2789460, was opened at 2009-05-09 18:34 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2789460&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: xqt (xqt) Assigned to: Nobody/Anonymous (nobody) Summary: fi-wiki crossing namespace Initial Comment: Please give the ability for crossing namespace to fi-wiki. The reason is on fi-wiki all wikipedia sites are seated on the wikipedia: namespace on all other project they didn't. See the iw-links on http://en.wikipedia.org/w/index.php?title=English_Wikipedia&action=edit&sec… for example. ---------------------------------------------------------------------- >Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-05-12 12:10 Message: done, in r6877 ---------------------------------------------------------------------- Comment By: xqt (xqt) Date: 2009-05-09 18:36 Message: Sorry, it must be fi-wiki ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2789460&group_…

1 0

[ pywikipediabot-Bugs-2788226 ] unknown disambiguations
by SourceForge.net 12 May '09

12 May '09

Bugs item #2788226, was opened at 2009-05-07 08:22 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2788226&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: JAn (jandudik) Assigned to: Nobody/Anonymous (nobody) Summary: unknown disambiguations Initial Comment: I found in cs.wiki dismabiguations with template {{acrònim}} This one is mentioned in wikipedia-family.py, but bot doesn't recogbnize it as disambiguation (e.g. http://ca.wikipedia.org/wiki/TK ) Next problem with unrecognised disambiguation is in zh(e.g. http://zh.wikipedia.org/wiki/TI ), where is used construction {{disambig||{{#if:拉丁字母缩写消歧义|Cat=拉丁字母缩写消歧义 }}|Help= }} ---------------------------------------------------------------------- >Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-05-12 11:56 Message: Great bug report. * The first bug was a case problem when comparing titles. Fixed in r6874 * Second one was a problem with parser functions when matching templates. Fixed in r6875 * And I added in r6876 the russian template. I dont speak Russian, but looking at the IW links, it is {{surname}}. Thanks =) ---------------------------------------------------------------------- Comment By: JAn (jandudik) Date: 2009-05-07 08:24 Message: Next one, but I am not sure if this may be disambiguation: ru: {{Фамилия|Немецкие}} used in http://ru.wikipedia.org/wiki/Лихтенберг_(значения) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2788226&group_…

1 0

[ pywikipediabot-Patches-2786487 ] Some translations for cosmetic_changes.py
by SourceForge.net 12 May '09

12 May '09

Patches item #2786487, was opened at 2009-05-04 11:13 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2786487&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Translations Group: None Status: Closed Resolution: Duplicate Priority: 5 Private: No Submitted By: xqt (xqt) Assigned to: Nobody/Anonymous (nobody) Summary: Some translations for cosmetic_changes.py Initial Comment: Here are some additional translations for cosmetic_changes.py ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-05-12 11:10 Message: This artifact has been marked as a duplicate of artifact 2787137 with reason: No explanation provided. ---------------------------------------------------------------------- Comment By: xqt (xqt) Date: 2009-05-05 11:59 Message: There is a new version on 2787137. Sorry I didn't found a upload button for the old one. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2786487&group_…

1 0

[ pywikipediabot-Bugs-2786042 ] family.py: Wrong File: translation on wo-wiki
by SourceForge.net 12 May '09

12 May '09

Bugs item #2786042, was opened at 2009-05-03 13:43 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2786042&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Pending Resolution: None Priority: 5 Private: No Submitted By: xqt (xqt) Assigned to: Nobody/Anonymous (nobody) Summary: family.py: Wrong File: translation on wo-wiki Initial Comment: __version__='$Id: family.py 6725 2009-04-26 08:31:00Z nicdumz $' There is a wrong Image: or File: translation on wo-wiki (see message at http://wo.wikipedia.org/wiki/Waxtaani_j%C3%ABfandikukat:Xqt and http://wo.wikipedia.org/w/index.php?title=Saytubiddiw&diff=prev&oldid=35829 as an example). Maybe it comes from this NS 7 translation 'wo': [u'Waxtaani dencukaay', u'Dencukaay'], but it should be 'wo': [u'Waxtaani dencukaay'], only. ---------------------------------------------------------------------- >Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-05-12 11:09 Message: I'm not sure of what happened. I changed the NS 7 localization in r6873. Can you comment back on that issue if it happens again after this? Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2786042&group_…

1 0

[ pywikipediabot-Feature Requests-2789464 ] wikipedia.replaceCategoryLinks
by SourceForge.net 12 May '09

12 May '09

Feature Requests item #2789464, was opened at 2009-05-09 12:49 Message generated for change (Comment added) made by sigmaoctantis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2789464&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: xqt (xqt) Assigned to: Nobody/Anonymous (nobody) Summary: wikipedia.replaceCategoryLinks Initial Comment: replaceCategoryLinks should not split {{link FA}} or {{link GA}} and interwiki links. Normaly this routine would place categories before iw links. But these two shouldn't as I got some entries on my de: talk page according et- and ru-wiki (see http://de.wikipedia.org/wiki/Benutzer_Diskussion:Xqt#moving_Link_templates_…) for expl. ---------------------------------------------------------------------- Comment By: sigmaoctantis (sigmaoctantis) Date: 2009-05-12 00:54 Message: add_text.py has a feature that might solve this (starsListInPage). ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-05-09 14:45 Message: It's a serious problem, causing a lot of trouble and critics by users who are angry that the bot "destroys" their articles. Obersachse ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2789464&group_…

1 0

[ pywikipediabot-Patches-1843798 ] Add capabiliy to remember pages to replace.py
by SourceForge.net 12 May '09

12 May '09

Patches item #1843798, was opened at 2007-12-03 21:45 Message generated for change (Comment added) made by sigmaoctantis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1843798&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pietro Battiston (toobaz) Assigned to: Nobody/Anonymous (nobody) Summary: Add capabiliy to remember pages to replace.py Initial Comment: When doing very long semi-automatic replacements, it can happen to kill the bot and to start again. So you have to say "no" again to all non wanted replacements. It is even worse if you're using an xml dump: it can be several weeks old, and it will make you download lot of pages that where ALREADY corrected. This patch consist in two parts: 1) a patch to replace.py that adds a new parameter, "-exclude", and makes it accept a path to a file which will be used both for: -> knowing which articles to exclude from substitution -> logging denied replaces' pages and pages already known to be not needing replacements 2) a patch to pagegenerators.py that adds a generator filter, able to yield only pages not appearing in a given list The only doubt I have is: should the replace.py log in some other way? xml? wikipedia module's predefined functions? log into a given wikipedia userpage (so that logs can easily be shared)? As I've done it, it needs to import os and codecs modules... don't know if it's a problem. Anyway, a patch like this is something really needed, if needed I can try to improve it. ---------------------------------------------------------------------- Comment By: sigmaoctantis (sigmaoctantis) Date: 2009-05-12 00:32 Message: see patch ID: 2790445 https://sourceforge.net/tracker/?func=detail&aid=2790445&group_id=93107&ati… ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-06-23 16:22 Message: Logged In: NO closed this patch? ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-01-16 09:20 Message: Logged In: NO replace.py already has the option -xmlstart:page when using an xml dump, to skip all entries before "page". ---------------------------------------------------------------------- Comment By: Daniel Herding (wikipedian) Date: 2008-01-16 07:35 Message: Logged In: YES user_id=880694 Originator: NO We already have something very similar for solve_disambiguation.py. When you run it with the -primary parameter, e.g. on [[en:London]], it saves all page titles where the user pressed 'N' to the 'disambiguations' directory, and skips these pages when you run the same command later. It saves the URL-encoded titles into a text files, one title per line, without [[brackets]]. It would be nice if some code could be shared, although I'm not sure if that's possible (I haven't yet looked at your code, but solve_disambiguation.py is a bit complicated). But we should keep solve_disambiguation's format because there are probably people who want to keep using their logs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1843798&group_…

1 0

[ pywikipediabot-Patches-2790445 ] Re 1843798: Add capabiliy to remember pages to replace.py
by SourceForge.net 12 May '09

12 May '09

Patches item #2790445, was opened at 2009-05-12 00:30 Message generated for change (Settings changed) made by sigmaoctantis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None >Priority: 7 Private: No Submitted By: sigmaoctantis (sigmaoctantis) Assigned to: Nobody/Anonymous (nobody) Summary: Re 1843798: Add capabiliy to remember pages to replace.py Initial Comment: A new patch to implement toobaz's function with the changes suggested by wikipedian. https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati… - solve_disambiguation.py and pagegenerators.py: 1. Generator and logging function for -primary option moved from solve_disambiguation.py to pagegenerators.py 2. TODO in solve_disambiguation.py done: generator now starts yielding before all referring pages have been found 3. makes use of new TextfilePageGenerator 4. code is a few lines shorter - replace.py: 5. "-exclude" option from toobaz's patch implemented. Allows to filter generator through a list of previously edited pages. New pages are appended to the filter file based on choices made: -exclude: logs to filter choice "N" 6. additional command line options for other settings: -editonce: logs to filter choices "Y", "A" -treatonce: logs to filter choices "Y", "A", "N" -scanonce: logs to filter choices "Y", "A", "N"; no change 7. uses generator and file format from solve_disambiguation.py (suggested by wikipedian below) 8. default filter filename is the name of the fix. Files are placed in a subdirectory "replace". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…

1 0

[ pywikipediabot-Patches-2790445 ] Re 1843798: Add capabiliy to remember pages to replace.py
by SourceForge.net 12 May '09

12 May '09

Patches item #2790445, was opened at 2009-05-12 00:30 Message generated for change (Tracker Item Submitted) made by sigmaoctantis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: sigmaoctantis (sigmaoctantis) Assigned to: Nobody/Anonymous (nobody) Summary: Re 1843798: Add capabiliy to remember pages to replace.py Initial Comment: A new patch to implement toobaz's function with the changes suggested by wikipedian. https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati… - solve_disambiguation.py and pagegenerators.py: 1. Generator and logging function for -primary option moved from solve_disambiguation.py to pagegenerators.py 2. TODO in solve_disambiguation.py done: generator now starts yielding before all referring pages have been found 3. makes use of new TextfilePageGenerator 4. code is a few lines shorter - replace.py: 5. "-exclude" option from toobaz's patch implemented. Allows to filter generator through a list of previously edited pages. New pages are appended to the filter file based on choices made: -exclude: logs to filter choice "N" 6. additional command line options for other settings: -editonce: logs to filter choices "Y", "A" -treatonce: logs to filter choices "Y", "A", "N" -scanonce: logs to filter choices "Y", "A", "N"; no change 7. uses generator and file format from solve_disambiguation.py (suggested by wikipedian below) 8. default filter filename is the name of the fix. Files are placed in a subdirectory "replace". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Pywikipedia-bugs