Patches item #2424422, was opened at 2008-12-13 13:34
Message generated for change (Comment added) made by rick_block
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2424422&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Rejected
Priority: 5
Private: No
Submitted By: Rick Block (rick_block)
Assigned to: Nobody/Anonymous (nobody)
Summary: -content option for replace.py
Initial Comment:
I have a number of tools I run at en.wikipedia that download pages, manipulate the pages using tools like awk, and then upload new versions of the pages. I've implemented an option to replace.py (-content) to provide a filename containing the replacement content for a page. The svn.diff file is attached. The same option can be used to create a new page with content from a specified file as well.
----------------------------------------------------------------------
Comment By: Rick Block (rick_block)
Date: 2009-10-03 20:02
Message:
NicDumz's main comment seems to be that he doesn't quite see the point.
The basic point is that I'm not a Python programmer :). This option allows
me to read a page using the MediaWiki api (or simply curl or wget with the
"action=raw" parameter), edit the page content using whatever I'd like (I
generally use awk), and then use replace.py to submit the page back. Using
replace.py (rather than the MediaWIki api) for this provides a much nicer
surround, including things like previewing the diff which is extremely
handy. Logically I'm constructing a sort of pipe (in shell, of course)
that fetches a page, edits the page, and then puts the page. The general
pattern is sort of like:
curl "http://en.wikipedia.org/w/index.php?title=$PAGE&action=raw" |
awk -f awkscript >$PAGE.tmp
python replace.py -content:$PAGE.tmp -page:$PAGE
I haven't bothered to figure out how to do this in Python, but given the
ability to read the content from stdin (using, say, "-" as the parameter
value), I could actually do the whole thing as a pipe like this:
curl "http://en.wikipedia.org/w/index.php?title=$PAGE&action=raw" |
awk -f awkscript |
python replace.py -content:- -page:$PAGE
I have a revised version of the patch that addresses at least most of the
detailed comments. These lines:
> + new_text = new_text.replace(u'\n',u'\r\n')
> + if ( new_text == original_text ) or ( new_text ==
original_text + '\r\n'):
>
> What is this ? This doesn't look quite good (two text comparisons
instead
> of one?), it's not documented, and it doesn't look related to the patch
> topic. (does it?)
replace all LF with CRLF in the replacement text and then change the
equality comparison
so the replacement text is considered equal whether or not the original is
terminated with a CRLF
(when comparing the entire page to an entire replacement page read from a
file, the end of
line terminator must be the same - as read by replace.py the content seems
to have CRLF at
the end of each line but not necessarily the last line). If there's a
better way to accomplish either
of these in Python, please let me know.
> * While updating your patch, please also update your patch against HEAD
I think it was updated when I submitted it. I've just updated again as
well.
Since the item is closed I don't think I can upload the updated patch.
----------------------------------------------------------------------
Comment By: siebrand (siebrand)
Date: 2009-10-02 03:59
Message:
Rejected because of lack of response from submitter to comment on
2009-04-27 18:13 by nicdumz. Feel free to reopen after addressing the
comment given and having uploaded an updated patch.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-27 10:13
Message:
Sorry for taking so long to review this.
I'm not sure that I understand what would be the usage of that script.
Could you provide a (commandline) example, and explain maybe a bit more
what would be an interesting application of it?
In fact, I dont really understand why one would need to use replace.py,
which is a rather complex script, to simple replace all the text in a page
with an arbitrary other text:
for page in gen:
page.put(text)
does the same thing. You have to add 4 or 5 lines to handle command line
arguments, of course, but using replace.py looks a bit "overkill" =)
But maybe I'm wrong here, so please include a use case :)
On the patch itself, if you still want me to include it:
* This patch mixes tab and spaces, which is a very bad practice in Python.
Please fix this
* Please make sure that replacement_text has a default None value in the
Bot constructor. Please also append it at the _end_ of the constructor
signature, and not in the middle of the arguments list, to ensure backwards
compatibility.
* Please be a bit more verbose in the documentation of the -content
option. If you think that -content will be useful to other users, you'll
have to explain them _why_ this is useful =) A commandline example, and a
real application could help here.
* + new_text = new_text.replace(u'\n',u'\r\n')
+ if ( new_text == original_text ) or ( new_text ==
original_text + '\r\n'):
What is this ? This doesn't look quite good (two text comparisons instead
of one?), it's not documented, and it doesn't look related to the patch
topic. (does it?)
* While updating your patch, please also update your patch against HEAD
Thanks :)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2424422&group_…
Bugs item #2872239, was opened at 2009-10-03 10:26
Message generated for change (Tracker Item Submitted) made by mu301
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2872239&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: mikeu (mu301)
Assigned to: Nobody/Anonymous (nobody)
Summary: welcome.py -timeoffset:X broken
Initial Comment:
bash-3.00$ date
Sat Oct 3 10:19:41 EDT 2009
bash-3.00$ python version.py
Pywikipedia [http] trunk/pywikipedia (r7363, 2009/10/03, 12:19:39)
Python 2.4.4 (#1, Jan 10 2007, 01:25:01) [C]
bash-3.00$ python welcome.py -random -limit:2000 -break -timeoffset:4320
Loading signature list...
Traceback (most recent call last):
File "welcome.py", line 982, in ?
bot.run()
File "welcome.py", line 786, in run
for users in self.parseNewUserLog():
File "welcome.py", line 676, in parseNewUserLog
params['lestart'] = int(now.strftime("%Y-%m-%dT%H:%M:%SZ"))
ValueError: invalid literal for int(): 2009-09-30T14:19:53Z
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2872239&group_…
Patches item #2871229, was opened at 2009-10-01 12:17
Message generated for change (Comment added) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2871229&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: JAn (jandudik)
Assigned to: Nobody/Anonymous (nobody)
Summary: commons_link.py update
Initial Comment:
In attached version:
* added cs. translation
* bugfix - link to mainpage
* bugfix - both {{commons}} and {{commonscat}} was posible to be added
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 13:27
Message:
* install TortoiseSVN (Windows) or svn (Linux)
* check out pywikipediabot
* make changes to the checked out files
* create a diff/patch. Google for more information.
----------------------------------------------------------------------
Comment By: siebrand (siebrand)
Date: 2009-10-02 11:18
Message:
r7345. Please attach a proper diff next time, and not a whole file.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2871229&group_…
Patches item #2424422, was opened at 2008-12-13 21:34
Message generated for change (Settings changed) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2424422&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Rick Block (rick_block)
Assigned to: Nobody/Anonymous (nobody)
Summary: -content option for replace.py
Initial Comment:
I have a number of tools I run at en.wikipedia that download pages, manipulate the pages using tools like awk, and then upload new versions of the pages. I've implemented an option to replace.py (-content) to provide a filename containing the replacement content for a page. The svn.diff file is attached. The same option can be used to create a new page with content from a specified file as well.
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:59
Message:
Rejected because of lack of response from submitter to comment on
2009-04-27 18:13 by nicdumz. Feel free to reopen after addressing the
comment given and having uploaded an updated patch.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-27 18:13
Message:
Sorry for taking so long to review this.
I'm not sure that I understand what would be the usage of that script.
Could you provide a (commandline) example, and explain maybe a bit more
what would be an interesting application of it?
In fact, I dont really understand why one would need to use replace.py,
which is a rather complex script, to simple replace all the text in a page
with an arbitrary other text:
for page in gen:
page.put(text)
does the same thing. You have to add 4 or 5 lines to handle command line
arguments, of course, but using replace.py looks a bit "overkill" =)
But maybe I'm wrong here, so please include a use case :)
On the patch itself, if you still want me to include it:
* This patch mixes tab and spaces, which is a very bad practice in Python.
Please fix this
* Please make sure that replacement_text has a default None value in the
Bot constructor. Please also append it at the _end_ of the constructor
signature, and not in the middle of the arguments list, to ensure backwards
compatibility.
* Please be a bit more verbose in the documentation of the -content
option. If you think that -content will be useful to other users, you'll
have to explain them _why_ this is useful =) A commandline example, and a
real application could help here.
* + new_text = new_text.replace(u'\n',u'\r\n')
+ if ( new_text == original_text ) or ( new_text ==
original_text + '\r\n'):
What is this ? This doesn't look quite good (two text comparisons instead
of one?), it's not documented, and it doesn't look related to the patch
topic. (does it?)
* While updating your patch, please also update your patch against HEAD
Thanks :)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2424422&group_…
Patches item #2777033, was opened at 2009-04-21 06:12
Message generated for change (Settings changed) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2777033&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Michael Cariaso (cariaso)
Assigned to: Nobody/Anonymous (nobody)
Summary: Show RecentChanges
Initial Comment:
clean simple code for watching the recent changes
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:57
Message:
Rejecting this patch. It screen scrapes RC, and we do not want that in
pywikipediabot anymore - certainly not for new features. Feel free to
reopen after attaching an updated patch that uses the API.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2777033&group_…
Patches item #2784482, was opened at 2009-04-30 18:47
Message generated for change (Comment added) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2784482&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Additional options for pagegenerators.py
Initial Comment:
Two other generators for pagegenerators.py
-xlink (similar to -link): reads links on a webpage rather than a file
-check : for WikiProject Check Wikipedia reports on toolserver
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:54
Message:
Submitter, please confirm this patch is still valid (too many changes to
MediaWiki and no time to test here)...
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-07 10:52
Message:
If there is interest, I could integrate the first generator with
"untaggedGenerator" in add_text.py
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2784482&group_…
Patches item #2787889, was opened at 2009-05-06 16:29
Message generated for change (Comment added) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2787889&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: liangent (liangent)
Assigned to: Nobody/Anonymous (nobody)
Summary: getcurrenttime()
Initial Comment:
maybe this is better (less cost)
Index: pywikibot/pywikibot/site.py
===================================================================
--- pywikibot/pywikibot/site.py (revision 6836)
+++ pywikibot/pywikibot/site.py (working copy)
@@ -794,10 +794,10 @@
"""
r = api.Request(site=self,
- action="parse",
+ action="expandtemplates",
text="{{CURRENTTIMESTAMP}}")
result = r.submit()
- return re.search('\d+', result['parse']['text']['*']).group()
+ return re.search('\d+', result['expandtemplates']['*']).group()
def getcurrenttime(self):
"""Return a Timestamp object representing the current server time."""
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:53
Message:
This request may already be outdated, but please attach a proper patch.
Otherwise this patch will be rejected for certain after 2 weeks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2787889&group_…
Patches item #2791305, was opened at 2009-05-13 19:07
Message generated for change (Comment added) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2791305&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Translations
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 7
Private: No
Submitted By: Maurits (mcknol)
Assigned to: Nobody/Anonymous (nobody)
Summary: Neglected disamb-templates on several wiki's by interwiki.py
Initial Comment:
Please add {{surname}}, {{hndis}} and {{given name}} to the list of possible disambiguation templates on en.wikipedia for interwiki.py.
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:51
Message:
Rejected. No patch attached.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-28 03:24
Message:
Hello.
Can you read anywhere that someone has fixed the bug?
I can't. The bug is still marked as open.
Also, this has _always_ been like this. Always. Nothing has changed on
pywikipedia side for detection of templates for en.wikipedia, ever.
If someone feels he can implement a patch ( see
http://www.mail-archive.com/pywikipedia-l@lists.wikimedia.org/msg01202.html
) for details. Please go ahead, and implement it.
----------------------------------------------------------------------
Comment By: Carsrac (carsrac)
Date: 2009-05-27 16:35
Message:
I have tested it and surname and shipindex is not detected, please undo the
None change, because it is not working.
Please test before making changes. Now a lot of iw are removed automaticly
by the bots. And increase the priority of this bug.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2009-05-23 12:22
Message:
It not solved and I am running r6915.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-15 22:43
Message:
those templates are not officially disambiguation templates for
en.wikipedia, those are set index templates.
Please read
http://www.mail-archive.com/pywikipedia-l@lists.wikimedia.org/msg01202.html
about it =)
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-15 20:00
Message:
There is a list of these templates at
http://en.wikipedia.org/wiki/MediaWiki:Disambiguationspage
----------------------------------------------------------------------
Comment By: Maurits (mcknol)
Date: 2009-05-15 12:16
Message:
See also the interwiki's of these templates. None of them seems to be
included and therefore, interwiki.py makes many wrong decisions.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2791305&group_…
Patches item #2790445, was opened at 2009-05-12 06:30
Message generated for change (Comment added) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
>Assigned to: NicDumZ — Nicolas Dumazet (nicdumz)
Summary: Re 1843798: Add capabiliy to remember pages to replace.py
Initial Comment:
A new patch to implement toobaz's function with the changes suggested by wikipedian.
https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati…
- solve_disambiguation.py and pagegenerators.py:
1. Generator and logging function for -primary option moved
from solve_disambiguation.py to pagegenerators.py
2. TODO in solve_disambiguation.py done:
generator now starts yielding before all referring pages have been found
3. makes use of new TextfilePageGenerator
4. code is a few lines shorter
- replace.py:
5. "-exclude" option from toobaz's patch implemented.
Allows to filter generator through a list of previously edited pages.
New pages are appended to the filter file based on choices made:
-exclude: logs to filter choice "N"
6. additional command line options for other settings:
-editonce: logs to filter choices "Y", "A"
-treatonce: logs to filter choices "Y", "A", "N"
-scanonce: logs to filter choices "Y", "A", "N"; no change
7. uses generator and file format from solve_disambiguation.py
(suggested by wikipedian below)
8. default filter filename is the name of the fix. Files are placed
in a subdirectory "replace".
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:49
Message:
Assigning to nicdumz for processing.
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-12 15:25
Message:
Thanks for the quick review. I will try to address the
various points and included a new version of the patch.
a. I added a bit more text to the source and reformatted
part of the code, but I didn't want to change existing
code more than needed.
b. generator:
- checks if the filter file exists
- reads it
- runs the next generator and skips pages in memory
Previously, it first run the next generator and then deleted
from its result pages that were in the filter file
c. replace.py command line options
I added several command line options to define which
pages should be skipped the next time. One could edit
replace.py directly, but it seemed cleaner to provide
all options at command line level.
toobaz excluded pages where a replacement was manually
rejected ("N"). The option "-exclude" will keep this
functionality.
Personally, I find it more useful to filter pages that
were edited in a previous run. This avoids that the bot
repeats the same edit later, after someone reverted
a previous edit. Option "-editonce" provides this.
"-treatonce" combines the two.
"-scanonce" avoids that the bot re-fetches the same page
in a 2nd run, even if the regex didn't match it in
the first run. (I fixed an omission for "skipped" in
the second patch)
Without the different options, the additions to replace.py
would be much shorter ..
d. I had to insert several "break" in replace.py to avoid
that nothing but "N" gets to the stage confusingly labeled
"choice must be 'N'" in the code.
e. FilterFileAppend is based on the function from
solve_disambiguation. The advantage of writing each
page to the file is that it wont miss one if it's
interrupted or crashes. This mode from
solve_disambiguation remains unchanged.
f. The same goes for the file format. Up to now, I didn't
have any problems with it and it worked ok with a
title "臺灣Taiwan&āàäà" I just tested. urlname was also
used by PrimaryIgnoreManager. For backward compatibility,
may it should be kept.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 12:47
Message:
Wow, that's a big patch =)
* codecs is fine with me
* can you avoid lines > 80 characters? I know that this is not something
we do everywhere, but that's bad looking code. Same goes for if foo: bar.
Please skip a line.
* can you document thoroughly what's being done? parameters in the
generators? In replace.py ? I find it really hard to understand the
"choice" table in the docstring explaining -scanonce & others.
* What's this:
+ f = codecs.open(filename, 'r', 'utf-8')
+ f.close()
??
I am also not convinced by the fact that after each page, FilterFileAppend
is called, and #1 path is computed, #2 a file is opened, written in, and
closed.
I'm thinking that a possible cleaner way to do this would be to have a
Filter object: put everything you need in it (an opened file descriptor, a
list of titles to ignore if you need to use this, etc...) and keep a
reference to it from the replace & disambig bots. How does that sound to
you?
I also know that Daniel wanted first to keep the same file format, but...
a couple of things are wrong here:
* if you output titles with page.urlname() it will not be possible to read
the file with TextfilePageGenerator afaik. Think of special characters,
being url encoded, and not decoded.
* if you want to use a Page title for a filename, you want
Page.titleforFilename, not Page.urlname
Thank you!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Patches item #2815371, was opened at 2009-07-01 20:56
Message generated for change (Settings changed) made by siebrand
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2815371&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: almaghi (almaghi)
Assigned to: Nobody/Anonymous (nobody)
Summary: add-text.py: better help if no parameters given
Initial Comment:
add-text.py: better help if no parameters given in command.
----------------------------------------------------------------------
>Comment By: siebrand (siebrand)
Date: 2009-10-02 11:48
Message:
Rejected. No patch attached.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2815371&group_…