I'm using replace.py to create wikilinks. Usually I want to select only the first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or course title, so I don't want to add the wikilink). If I choose n for no, the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm guessing it will need a modified version of replace.py, so that it gives an extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$) " "[[\1]]\2" -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref -excepttext:"(?si)[[((?:FOO1|FOO2)[|]])" -namespace:0 -namespace:102 -namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
Hi!
2009/11/23 Chris Watkins chriswaterguy@appropedia.org
I'm using replace.py to create wikilinks. Usually I want to select only the first occurrence of the search string, and my command works fine for this.
I don't understand that, how do you select only the first one? For me, replace.py either changes each instance within a page, or nothing.
As far as I understand, at this opont replace.py gives the command to wikipedia.py: new_text = wikipedia.replaceExcept(new_text, old, new, exceptions,
allowoverlap=self.allowoverlap) So the solution should be in wikipedia.py.
Bináris,
On Thu, Nov 26, 2009 at 18:27, Bináris wikiposta@gmail.com wrote:
Hi!
2009/11/23 Chris Watkins chriswaterguy@appropedia.org
I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
I don't understand that, how do you select only the first one? For me, replace.py either changes each instance within a page, or nothing.
In the command I use, look at the end of the search and replace strings:
python replace.py -regex "(?si)\b((?:CCAT|Campus Center for Appropriate Technology))\b(.*$)" "[[\1]]\2" -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref -excepttext:"(?si)[[((?:CCAT|Campus Center for Appropriate Technology)[|]])" -namespace:0 -namespace:102 -namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square brackets to: CCAT|Campus Center for Appropriate Technology." -log -xml:currentdump.xml
Notice that the -regex parameter is used, and the search text ends with (.*$), which matches the entire rest of the article. Thus that text is not searched again. It is replaced in the replace string by \2, which I think means the second string from the search term.
I heard this tip from this mailing list over a year ago, and also from #regex on freenode - irc://irc.freenode.net/regex , which is an active and very helpful place to get regex help.
As far as I understand, at this opont replace.py gives the command to wikipedia.py: new_text = wikipedia.replaceExcept(new_text, old, new, exceptions,
allowoverlap=self.allowoverlap) So the solution should be in wikipedia.py.
Cool. Anyone have an idea what we can do with wikipedia.py?
Thanks Chris
2009/11/26 Chris Watkins chriswaterguy@appropedia.org
Notice that the -regex parameter is used, and the search text ends with (.*$), which matches the entire rest of the article.
Not bad, not bad. :-) Nice solution. \2 is strange for me, because it should be \2, and it does work that way. I thought, \2 should be interpreted as a \ mark followed by a 2 number, not \2 (second group). So I don't understand again. :-)
On Thu, Nov 26, 2009 at 20:39, Bináris wikiposta@gmail.com wrote:
2009/11/26 Chris Watkins chriswaterguy@appropedia.org
Notice that the -regex parameter is used, and the search text ends with (.*$), which matches the entire rest of the article.
Not bad, not bad. :-) Nice solution. \2 is strange for me, because it should be \2, and it does work that way. I thought, \2 should be interpreted as a \ mark followed by a 2 number, not \2 (second group). So I don't understand again. :-)
I guess it's \2, but because of the regex tag, we need to escape the backslash? I don't know - it works, and I'm happy :-). Good question though, I'll keep that in mind if I ever use \1 without the regex - probably needs to change to \1.
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Chris, now I understand the mystery of double backslash. That is because you don't use *r *before your regex strings as shown in fixes.py. In that time I did not notice the lack of that "r". See http://docs.python.org/py3k/howto/regex.html#the-backslash-plague
2009/11/26 Chris Watkins chriswaterguy@appropedia.org
On Thu, Nov 26, 2009 at 20:39, Bináris wikiposta@gmail.com wrote:
2009/11/26 Chris Watkins chriswaterguy@appropedia.org
Notice that the -regex parameter is used, and the search text ends with (.*$), which matches the entire rest of the article.
Not bad, not bad. :-) Nice solution. \2 is strange for me, because it should be \2, and it does work that way. I thought, \2 should be interpreted as a \ mark followed by a 2 number, not \2 (second group). So I don't understand again. :-)
I guess it's \2, but because of the regex tag, we need to escape the backslash? I don't know - it works, and I'm happy :-). Good question though, I'll keep that in mind if I ever use \1 without the regex - probably needs to change to \1.
Hello,
I have noticed this edit that is strange : http://fr.wikipedia.org/w/index.php?title=Licence_de_documentation_libre_GNU...
The name of the interwiki is different in the article and in the comment (at least the letter ê̤, that become ề̤)
As 3 bots did the error, I suspect a bug on Pywikipedia. On Windows Vista I don't have that bug.
Regards
Hercule
Hello Hercule,
See http://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&... . These bot owners should update their pywikipediabot installations, as it will show a warning when the python version in use contains this bug. To prevent this bug, they should use an older python version.
Best regards, Merlijn van Deen
On 20 November 2010 15:53, Antoine Delarue antoinedelarue@hotmail.comwrote:
Hello,
I have noticed this edit that is strange : http://fr.wikipedia.org/w/index.php?title=Licence_de_documentation_libre_GNU...
The name of the interwiki is different in the article and in the comment (at least the letter ê̤, that become ề̤)
As 3 bots did the error, I suspect a bug on Pywikipedia. On Windows Vista I don't have that bug.
Regards
Hercule
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l