Patches item #2424422, was opened at 2008-12-13 13:34 Message generated for change (Comment added) made by rick_block You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2424422...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Rejected Priority: 5 Private: No Submitted By: Rick Block (rick_block) Assigned to: Nobody/Anonymous (nobody) Summary: -content option for replace.py
Initial Comment: I have a number of tools I run at en.wikipedia that download pages, manipulate the pages using tools like awk, and then upload new versions of the pages. I've implemented an option to replace.py (-content) to provide a filename containing the replacement content for a page. The svn.diff file is attached. The same option can be used to create a new page with content from a specified file as well.
----------------------------------------------------------------------
Comment By: Rick Block (rick_block) Date: 2009-10-03 20:02
Message: NicDumz's main comment seems to be that he doesn't quite see the point. The basic point is that I'm not a Python programmer :). This option allows me to read a page using the MediaWiki api (or simply curl or wget with the "action=raw" parameter), edit the page content using whatever I'd like (I generally use awk), and then use replace.py to submit the page back. Using replace.py (rather than the MediaWIki api) for this provides a much nicer surround, including things like previewing the diff which is extremely handy. Logically I'm constructing a sort of pipe (in shell, of course) that fetches a page, edits the page, and then puts the page. The general pattern is sort of like:
curl "http://en.wikipedia.org/w/index.php?title=$PAGE&action=raw" | awk -f awkscript >$PAGE.tmp
python replace.py -content:$PAGE.tmp -page:$PAGE
I haven't bothered to figure out how to do this in Python, but given the ability to read the content from stdin (using, say, "-" as the parameter value), I could actually do the whole thing as a pipe like this:
curl "http://en.wikipedia.org/w/index.php?title=$PAGE&action=raw" | awk -f awkscript | python replace.py -content:- -page:$PAGE
I have a revised version of the patch that addresses at least most of the detailed comments. These lines:
new_text = new_text.replace(u'\n',u'\r\n')
if ( new_text == original_text ) or ( new_text ==
original_text + '\r\n'):
What is this ? This doesn't look quite good (two text comparisons
instead
of one?), it's not documented, and it doesn't look related to the patch topic. (does it?)
replace all LF with CRLF in the replacement text and then change the equality comparison so the replacement text is considered equal whether or not the original is terminated with a CRLF (when comparing the entire page to an entire replacement page read from a file, the end of line terminator must be the same - as read by replace.py the content seems to have CRLF at the end of each line but not necessarily the last line). If there's a better way to accomplish either of these in Python, please let me know.
- While updating your patch, please also update your patch against HEAD
I think it was updated when I submitted it. I've just updated again as well.
Since the item is closed I don't think I can upload the updated patch.
----------------------------------------------------------------------
Comment By: siebrand (siebrand) Date: 2009-10-02 03:59
Message: Rejected because of lack of response from submitter to comment on 2009-04-27 18:13 by nicdumz. Feel free to reopen after addressing the comment given and having uploaded an updated patch.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-04-27 10:13
Message: Sorry for taking so long to review this.
I'm not sure that I understand what would be the usage of that script. Could you provide a (commandline) example, and explain maybe a bit more what would be an interesting application of it?
In fact, I dont really understand why one would need to use replace.py, which is a rather complex script, to simple replace all the text in a page with an arbitrary other text: for page in gen: page.put(text)
does the same thing. You have to add 4 or 5 lines to handle command line arguments, of course, but using replace.py looks a bit "overkill" =)
But maybe I'm wrong here, so please include a use case :)
On the patch itself, if you still want me to include it: * This patch mixes tab and spaces, which is a very bad practice in Python. Please fix this * Please make sure that replacement_text has a default None value in the Bot constructor. Please also append it at the _end_ of the constructor signature, and not in the middle of the arguments list, to ensure backwards compatibility. * Please be a bit more verbose in the documentation of the -content option. If you think that -content will be useful to other users, you'll have to explain them _why_ this is useful =) A commandline example, and a real application could help here. * + new_text = new_text.replace(u'\n',u'\r\n') + if ( new_text == original_text ) or ( new_text == original_text + '\r\n'):
What is this ? This doesn't look quite good (two text comparisons instead of one?), it's not documented, and it doesn't look related to the patch topic. (does it?) * While updating your patch, please also update your patch against HEAD
Thanks :)
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2424422...
pywikipedia-bugs@lists.wikimedia.org