Suppose I wanted to change all occurrences of
{{cite book | ... | isbn=123 | ...}} into ... | id=ISBN 123 | ...
or all {{cite book | last=Marx | first=Karl | ...}} into | author=Karl Marx | ...
How would I do that? Apparently template.py can only change the template name (e.g. "cite book" into some other language) and the regexp capability of replace.py quickly makes this a mess, especially when you consider that first=Karl|year=1867|last=Marx is just as valid, and that template calls can contain newlines.
I guess what I want is a robot framework that finds all uses of a specified template (e.g. "cite book") and makes a call to a user-defined function, that receives the template parameters as a dictionary (hash, mapping). Then I could write specific Python code like this:
t[id] = 'ISBN ' + t[isbn] del t[isbn]
t[author] = t[first] + ' ' + t[last] del t[first] del t[last]
This too sounds easy until you consider that parameter values can contain template calls, e.g. {{cite book|...|id={{ISSN|1234-5678}}}}, so simply matching {{.*?}} isn't enough.
And then when the user-defined function returns, the dictionary must be packed back into text, preserving as much as possible of the previous whitespace and newline structure, to minimize the edit diff.
Has anybody tried this?
:)
Trust me, learning how to use python regexp worth it :) (I learned reading this page : http://www.amk.ca/python/howto/regex/ )
In your case, replace.py will do the trick, with something like : "({{cite book[^}]*isbn\s*=\s*)(\d*)" "\1ISBN \2" for the first and both "({{cite book[^}]*)last\s*=\s*(\w*)[^}]*)first\s*=\s*(\w*)" "\1author=\3 \2" "({{cite book[^}]*)first\s*=\s*(\w*)[^}]*)last\s*=\s*(\w*)" "\1author=\2 \3" for the second will work. (but other regexps might also work)
Most of the time, when automated changes are needed, I use replace.py, with a regex. Regular expressions are very, very efficient.
2007/10/9, Lars Aronsson lars@aronsson.se:
Suppose I wanted to change all occurrences of
{{cite book | ... | isbn=123 | ...}} into ... | id=ISBN 123 | ...
or all {{cite book | last=Marx | first=Karl | ...}} into | author=Karl Marx | ...
How would I do that? Apparently template.py can only change the template name (e.g. "cite book" into some other language) and the regexp capability of replace.py quickly makes this a mess, especially when you consider that first=Karl|year=1867|last=Marx is just as valid, and that template calls can contain newlines.
I guess what I want is a robot framework that finds all uses of a specified template (e.g. "cite book") and makes a call to a user-defined function, that receives the template parameters as a dictionary (hash, mapping). Then I could write specific Python code like this:
t[id] = 'ISBN ' + t[isbn] del t[isbn]
t[author] = t[first] + ' ' + t[last] del t[first] del t[last]
This too sounds easy until you consider that parameter values can contain template calls, e.g. {{cite book|...|id={{ISSN|1234-5678}}}}, so simply matching {{.*?}} isn't enough.
And then when the user-defined function returns, the dictionary must be packed back into text, preserving as much as possible of the previous whitespace and newline structure, to minimize the edit diff.
Has anybody tried this?
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Nicolas Dumazet wrote:
Trust me, learning how to use python regexp worth it :) (I learned reading this page : http://www.amk.ca/python/howto/regex/ )
I'm an experienced Perl programmer. I know regexps, even though I'm a beginner in Python.
In your case, replace.py will do the trick, with something like : "({{cite book[^}]*isbn\s*=\s*)(\d*)" "\1ISBN \2" for the first
Sorry, this only works in some cases. Consider a user who thinks the author name should appear in small caps:
{{cite book| author={{sc|Karl Marx}} | isbn=123 }}
Now your [^}]* will fail, because there is a (2nd level) template call between "{{cite book" and "isbn=". Even if it isn't impossible, it *is* a mess to get this right in regexps. That would be fine if it was done *once* in the Python code, but requiring the bot operator to get it right on the replace.py command line for every call, is not realistic.
Now I hear words of a pywikiparser - and that sounds promising. Will that take care of template call parsing? Where can I find it, to try it out?
In addition to this, I believe your regexp will fail where there are linebreaks between "{{cite book" and "isbn=", but this could be fixed by adding a command line option to replace.py that activates Python's re.compile(..., re.MULTILINE), akin to the existing option for re.IGNORECASE.