Hi,
Each ns:0 page on English Wiktionary is divided into a bunch of sections headed by level-2 headers. The text of each level-2 header is the name of a language; e.g., ==English==.
I use (something like) the following JavaScript when editing pages:
txt = txt.replace ( /^==([a-zA-Z ]+)==\n+(?:(?:===|[^=]).*\n+)*/gm, function(section, langname) { return '' + section.replace ( /({{homophones|)([^=}]*}})/gm, '$1lang={'+'{subst:langrev|'+langname+'}}|$2' ); } );
This searches for {{homophones|...}} without a lang= parameter and adds the lang= parameter appropriate for the ==section== in which {{homophones|...}} appears. This works.
I want to automate this, so wish to use pywikipediabot. So I've translated the above into Python as best I could, and come up with the following user-fixes.py :
def homophix(match): return re.sub(r'({{homophones|)([^}=]*}})', r'\1lang={{subst:langrev|'+re.escape(match.group(1))+r'}}|\2', match.group(0) )
fixes['homophones'] = { 'regex': True, 'msg': {'_default':u'add lang to homophones'}, 'replacements': [ (ur'^==([a-zA-Z ]+)==\n+(?:(?:===|[^=]).*\n+)*', homophix) ] }
...which I then tried to call using python replace.py -fix:homophones -page:accapare
(Note that [[wikt:en:accapare]] has {{homophones|...}} without = .)
Python told me: No changes were necessary in [[accapare]] 0 pages were changed.
So I guess it's either not matching or not replacing.
What am I doing wrong?
And what can I do instead?
Thanks,
Michael Hamm
Try this first in replacements:
ur'(?m)^== *([a-zA-Z ]+) *==\n+ etc. m for multiline (sometimes ?s works instead, I just use it without understanding why), and space* in case there are spaces between == marks.
@Bináris: "(?m)" means /^/ and /$/ will also match beginning and ending of lines, while "(?s)" means /./ will also match "\n". These two flags are not the same and can't use interchangeably.
@Michael Hamm: Beside Bináris's suggestion, re.escape should be used only on pattern parameter, not replacing text; use match.group(1) instead of re.escape(match.group(1))
On Thu, May 2, 2013 at 2:41 PM, Bináris wikiposta@gmail.com wrote:
Try this first in replacements:
ur'(?m)^== *([a-zA-Z ]+) *==\n+ etc. m for multiline (sometimes ?s works instead, I just use it without understanding why), and space* in case there are spaces between == marks.
-- Bináris _______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Several hours ago, I wrote, in part:
the following user-fixes.py :
def homophix(match): return re.sub(r'({{homophones|)([^}=]*}})', r'\1lang={{subst:langrev|'+re.escape(match.group(1))+r'}}|\2', match.group(0) )
fixes['homophones'] = { 'regex': True, 'msg': {'_default':u'add lang to homophones'}, 'replacements': [ (ur'^==([a-zA-Z ]+)==\n+(?:(?:===|[^=]).*\n+)*', homophix) ] }
...which I then tried to call using python replace.py -fix:homophones -page:accapare
(Note that [[wikt:en:accapare]] has {{homophones|...}} without = .)
Python told me: No changes were necessary in [[accapare]] 0 pages were changed.
So I guess it's either not matching or not replacing.
What am I doing wrong?
And what can I do instead?
I've taken the advice I've gotten on-list, but the script still doesn't work. I'd appreciate any further ideas.
Thanks,
Michael Hamm
Okay, This code works for me.
#------- user-fixes.py ------- import re
def homophix(match): return re.sub(r'({{homophones|)([^}=]*}})', r'\1lang={{subst:langrev|'+match.group(1)+r'}}|\2', match.group(0) )
fixes['homophones'] = { 'regex': True, 'msg': {'_default':u'add lang to homophones'}, 'replacements': [ (r'(?m)^== *([^\n]+) *== *((?!==(?!=))[^\n]*\n)*', homophix) ] } #--------------
See http://en.wiktionary.org/w/index.php?title=Wiktionary%3ASandbox&diff=203... its edit.
On Fri, May 3, 2013 at 4:02 AM, Michael Hamm msh210@gmail.com wrote:
Several hours ago, I wrote, in part:
the following user-fixes.py :
def homophix(match): return re.sub(r'({{homophones|)([^}=]*}})',
r'\1lang={{subst:langrev|'+re.escape(match.group(1))+r'}}|\2',
match.group(0) )
fixes['homophones'] = { 'regex': True, 'msg': {'_default':u'add lang to homophones'}, 'replacements': [ (ur'^==([a-zA-Z ]+)==\n+(?:(?:===|[^=]).*\n+)*', homophix) ] }
...which I then tried to call using python replace.py -fix:homophones -page:accapare
(Note that [[wikt:en:accapare]] has {{homophones|...}} without = .)
Python told me: No changes were necessary in [[accapare]] 0 pages were changed.
So I guess it's either not matching or not replacing.
What am I doing wrong?
And what can I do instead?
I've taken the advice I've gotten on-list, but the script still doesn't work. I'd appreciate any further ideas.
Thanks,
Michael Hamm
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
pywikipedia-l@lists.wikimedia.org