Hello,
we got a request to rename a lot of articles in huwiki. See http://hu.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Botgazd%C3%A1k_%C3%... All the "0" chars in titles mean "any digit". As far as I see, move.py does not handle regexps in the way replace.py does, although I would just need this feature.
How would you solve this problem? Earlier in another problem I generated "move X Y" lines with Excel, and put them in a batch file, calling move.py as many times as the number of articles to rename, what is not too nice, but now I don't know all the original titles, only patterns.
2010/8/10 Bináris wikiposta@gmail.com
Hello,
we got a request to rename a lot of articles in huwiki. See http://hu.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Botgazd%C3%A1k_%C3%... All the "0" chars in titles mean "any digit". As far as I see, move.py does not handle regexps in the way replace.py does, although I would just need this feature.
How would you solve this problem? Earlier in another problem I generated "move X Y" lines with Excel, and put them in a batch file, calling move.py as many times as the number of articles to rename, what is not too nice, but now I don't know all the original titles, only patterns.
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Use movepages.py, with -pairs parameter. It will read a text file, where old names and new names are paired linke this: [[old name1]] [[new name1]] [[old name2]] [[new name2]] ....
http://meta.wikimedia.org/wiki/Pywikipediabot/movepages.py
Alex
Thank you, I know this syntax, but the problem is that I don't know the old names, beacuse I have only a pattern of the old names that could be handled by regexps. I don't know even of the approximate number of the old names. The real solution would be make movepages.py recognize the regexps.
2010/8/10 Alex Brollo alex.brollo@gmail.com
Use movepages.py, with -pairs parameter. It will read a text file, where old names and new names are paired linke this: [[old name1]] [[new name1]] [[old name2]] [[new name2]] ....
http://meta.wikimedia.org/wiki/Pywikipediabot/movepages.py
Alex
-- Alex
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On 10 August 2010 09:24, Bináris wikiposta@gmail.com wrote:
All the "0" chars in titles mean "any digit". As far as I see, move.py does not handle regexps in the way replace.py does, although I would just need this feature.
How would you solve this problem?
Use the framework instead of relying ready-to-use bots. Use the PrefixingPageGenerator combined with the RegexFilterPageGenerator (from pagegenerators.py) to yield the correct pages to move. Something like:
gen = RegexFilterPageGenerator(PrefixingPageGenerator("kkStB"), r"kkStB
[0-9][0-9]?[0-9]?")
for page in gen:
... print page ... [[KkStB 1]] [[KkStB 10]] [[KkStB 105]] [[KkStB 106]] [[KkStB 108]] [[KkStB 110]] [[KkStB 110.500]] [[KkStB 129]] [[KkStB 14]] [[KkStB 151]] [[KkStB 166]] [[KkStB 17]] [[KkStB 179]] [[KkStB 180]] [[KkStB 180.500]] [[KkStB 205]] [[KkStB 206]] [[KkStB 207]] [[KkStB 210]] [[KkStB 211]] [[KkStB 229]] [[KkStB 231]] [[KkStB 27]] [[KkStB 270]] [[KkStB 306]] [[KkStB 31.01–11]] [[KkStB 310]] [[KkStB 310.300]] [[KkStB 32]] [[KkStB 329]] [[KkStB 378]] [[KkStB 393]] [[KkStB 4]] [[KkStB 406]] [[KkStB 46]] [[KkStB 5]] [[KkStB 506]] [[KkStB 571]] [[KkStB 6]] [[KkStB 60]] [[KkStB 7]] [[KkStB 76 sorozatú szerkocsi]]
of course, adapt the regexp to create a slightly better filter. These pages can then be moved using the move(newtitle) method of the page object.
Best regards, Merlijn
2010/8/10 Merlijn van Deen valhallasw@arctus.nl
Use the framework instead of relying ready-to-use bots. Use the PrefixingPageGenerator combined with the RegexFilterPageGenerator (from pagegenerators.py) to yield the correct pages to move. Something like:
Thank you, this is really useful, I will try this! Is your name from Merlin the wizard? :-)
I have read movepages.py, and I have found something interesting. Movepages.py can handle regexes in some way, but it is not documented at all. Press ctrl F and search for "regex" in the source. "r" is an option that may be used during the program, not as command line parameter, I guess.
I tried it, and there is a funny thing. I was wondering, how this factory yields articles with more than 3 digits in title. I modified your expression first: gen = RegexFilterPageGenerator(PrefixingPageGenerator("kkStB"), r"kkStB [0-9]") and after it: gen = RegexFilterPageGenerator(PrefixingPageGenerator("kkStB"), r"kk") and the same list was generated in every case, as your one. So the outer generator seems useless.
2010/8/10 Merlijn van Deen valhallasw@arctus.nl
gen = RegexFilterPageGenerator(PrefixingPageGenerator("kkStB"), r"kkStB
[0-9][0-9]?[0-9]?")
for page in gen:
... print page ... [[KkStB 1]] [[KkStB 10]] [[KkStB 105]] [[KkStB 106]] [[KkStB 108]] [[KkStB 110]] [[KkStB 110.500]] [[KkStB 129]] [[KkStB 14]] [[KkStB 151]] [[KkStB 166]] [[KkStB 17]] [[KkStB 179]] [[KkStB 180]] [[KkStB 180.500]] [[KkStB 205]] [[KkStB 206]] [[KkStB 207]] [[KkStB 210]] [[KkStB 211]] [[KkStB 229]] [[KkStB 231]] [[KkStB 27]] [[KkStB 270]] [[KkStB 306]] [[KkStB 31.01–11]] [[KkStB 310]] [[KkStB 310.300]] [[KkStB 32]] [[KkStB 329]] [[KkStB 378]] [[KkStB 393]] [[KkStB 4]] [[KkStB 406]] [[KkStB 46]] [[KkStB 5]] [[KkStB 506]] [[KkStB 571]] [[KkStB 6]] [[KkStB 60]] [[KkStB 7]] [[KkStB 76 sorozatú szerkocsi]]
pywikipedia-l@lists.wikimedia.org