Hi guys, I just installed the pywikipedia bot on my wiki yesterday. I'm new to Python but I can try learn it since I'm familiar with PHP. It would take me a while though to make this first bot since I'm new to the language. The tasks are pretty straightforward. I would like the bot to run without any user input and do all of this by itself:
1. For every page on the wiki, check if it has these three characters: ( , ) , : . Any page containing any of these characters (curly brackets and colon) will be moved to a new title. The original title is var_1.
2. For the new title, brackets are simply deleted, and the : (colon) is replaced with a " - " (a dash with a space on each side). The new title generated is var_2.
3. Insert this text at the top of this page: {{page_rename|var_1}}, and save page.
4. Find any existing links on the site to this page which would be in the format of [[var_1]], and change them to [[var_2|var_1]].
I don't need any menus or other functionality. Is something something pretty straightforward to make? I would appreciate any tips/help and if its something that can be made pretty easily, I would be really thankful if someone could do this for me or give me a good start. I've looked at some of existing pywikipedia bot scripts (basic.py, movepages.py) but none of them would work for me and being new to Python, it would take me a long time to do what I need but in any case I will learn a lot in this first attempt.
thanks Eric
2012/1/15 Eric K ek79501@yahoo.com
Hi guys, I just installed the pywikipedia bot on my wiki yesterday. I'm new to Python but I can try learn it since I'm familiar with PHP. It would take me a while though to make this first bot since I'm new to the language. The tasks are pretty straightforward. I would like the bot to run without any user input and do all of this by itself:
- For every page on the wiki, check if it has these three characters: ( ,
) , : . Any page containing any of these characters (curly brackets and colon) will be moved to a new title. The original title is var_1.
- For the new title, brackets are simply deleted, and the : (colon) is
replaced with a " - " (a dash with a space on each side). The new title generated is var_2.
- Insert this text at the top of this page: {{page_rename|var_1}}, and
save page.
- Find any existing links on the site to this page which would be in the
format of [[var_1]], and change them to [[var_2|var_1]].
I don't need any menus or other functionality. Is something something pretty straightforward to make? I would appreciate any tips/help and if its something that can be made pretty easily, I would be really thankful if someone could do this for me or give me a good start. I've looked at some of existing pywikipedia bot scripts (basic.py, movepages.py) but none of them would work for me and being new to Python, it would take me a long time to do what I need but in any case I will learn a lot in this first attempt.
thanks Eric
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
This is how I would do it. It is probably a hacky solution, and there may be better/more efficient ways of doing it, but it should work.
*Step 1: Getting list of pages to change*
Run this line:
python replace.py -regex -requiretitle:"(|)|:" "[A-Za-z0-9]" "test" -save:Pagestoberenamed.txt -start:!
Press "a" when it prompts.
This will not change anything, only save a list of all pages that need to be renamed. The script assumes there is either a letter or number in all the pages that needs to be changed.
*Step 2:* *Put that template on top of the pages*
Run this line: python add_text.py -up -text:"{{page_rename|{{subst:PAGENAME}}}}" -file:Pagestoberenamed.txt
*Step 3: Creating list for renaming files*
Open the file "Pagestoberenamed.txt" in a regex-supporting text editor and use the follow regex replacements:
Replace: #[[([^:]*):([^]]*)]] with [[\1:\2]] [[\1 - \2]]
and replace #[[([^(]*)(([^)]*))([^]]*)]] with [[\1(\2)\3]] [[\1\2\3]]
I don't actually have a text editor that supports regex, so instead I copypasted the contents of that file into a sandbox page, and ran the following line: python replace.py -page:SANDBOX -regex "#[[([^:]*):([^]]*)]]" "[[\1:\2]] [[\1 - \2]]" "#[[([^(]*)(([^)]*))([^]]*)]]" "[[\1(\2)\3]] [[\1\2\3]]"
Save the text as Pagerenaming.txt
Hacky solution, but it should work.
*Step 4: Moving the pages * Run this line: python movepages.py -pairs:Pagerenaming.txt
It will not prompt you, it will move the pages as specified in Pagerenaming.txt If you do not want to have redirects from the old page names, use -noredirect as an additional argument. This may depend on how your wiki is set up, I know Wikipedias didn't have this option until relatively recently (and maybe it is only for administrators now).
*Step 5: Fixing links* Links can be fixed using this line: python replace.py -regex "[[([^:]*):([^]]*)]]" "[[\1 - \2|\1: \2]]" "[[([^(|^[]*)(([^)]*))([^]]*)]]" "[[\1\2\3|\1(\2)\3]]" -start:!
If you think it is too slow, you can append -pt:1 to that.
With this last one you should be careful, and approve quite a few changes manually first (pressing "y" and not "a"), in case something is fishy with the regex.
Hope this helps.
Hi guys Thanks again to Jon for his directions. I'm working through them. I have two additional steps, underlined below, that I need to do and I've tried to do them myself.
1. Delete all redirects that satisfy a certain regex. (regex is: ".*[^\w\s-].*" )
I'm ok with doing it in steps. I've tried something like this and first generate a list of the titles that need to be deleted, and then have the "delete" bot work on the generated list: python pagegenerators.py -titleregex:".*[^\w\s-].*" -redirectonly This task is to delete redirects that wont be of any use anymore. When I run that command, the bot tries to process all redirects, instead of the ones matching the regex. It may be that it cannot work on two different criteria at the same time (regex + redirectonly)? Maybe there's another way.
After that I'll move some pages around as per the directions below, generate some new redirects and then I need to do this for those new redirects: 2. A category needs to be added for all redirects that have a certain regex (its the same as above: ".*[^\w\s-].*" )
(same regex as above) The category is so that after a month or two I can set the Delete bot to delete those additional redirects.
The problem I'm facing with the add text.py bot, is that it doesnt add text to redirects (hence, it cannot add a category to a redirect). When it gets to a redirect, it automatically skips it. I might be wrong in my findings but this is what I ran into.
Tasks 1 and 2 (underlined) are independent of each other and thats all I need to do. The rest of the text in this email is explanatory. If anyone knows how I can do them, I would be grateful again for the help.
I attempted to learn python a little bit and then wondered if I could make my own bot - its still too advanced for me and it might just be easier using the bots if I can learn how to do so.
Possible bots that could be used: - add text: http://www.mediawiki.org/wiki/Manual:Pywikipediabot/add_text.py%C2%A0%C2%A0 <- does not add text to redirects
- category (add): http://www.mediawiki.org/wiki/Manual:Pywikipediabot/category.py%C2%A0%C2%A0%... <- It has a Regex feature, but again not sure if it will work on redirects
- page generator (generates list of pages for a certain regex, etc): http://www.mediawiki.org/wiki/Manual:Pywikipediabot/pagegenerators.py - delete bot: http://www.mediawiki.org/wiki/Manual:Pywikipediabot/delete.py (should work, once I have a list of redirects that need to be deleted, or edited to add text)
thanks! Erik
________________________________ From: Jon Harald Søby jhsoby@gmail.com To: Eric K ek79501@yahoo.com; Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Sent: Saturday, January 14, 2012 8:47 PM Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
2012/1/15 Eric K ek79501@yahoo.com
Hi guys,
I just installed the pywikipedia bot on my wiki yesterday. I'm new to Python but I can try learn it since I'm familiar with PHP. It would take me a while though to make this first bot since I'm new to the language. The tasks are pretty straightforward. I would like the bot to run without any user input and do all of this by itself:
- For every page on the wiki, check if it has these three characters: ( , ) , : . Any page containing any of these characters (curly brackets and colon) will be moved to a new title. The original title is var_1.
- For the new title, brackets are simply deleted, and the : (colon) is replaced with a " - " (a dash with a space on each side). The new title generated is var_2.
- Insert this text at the top of this page: {{page_rename|var_1}}, and save page.
- Find any existing links on the site to this page which would be in the format of [[var_1]], and change them to [[var_2|var_1]].
I don't need any menus or other functionality. Is something something pretty straightforward to make? I would appreciate any tips/help and if its something that can be made pretty easily, I would be really thankful if someone could do this for me or give me a good start. I've looked at some of existing pywikipedia bot scripts (basic.py, movepages.py) but none of them would work for me and being new to Python, it would take me a long time to do what I need but in any case I will learn a lot in this first attempt.
thanksEric _______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
This is how I would do it. It is probably a hacky solution, and there may be better/more efficient ways of doing it, but it should work.
Step 1: Getting list of pages to change
Run this line:
python replace.py -regex -requiretitle:"(|)|:" "[A-Za-z0-9]" "test" -save:Pagestoberenamed.txt -start:!
Press "a" when it prompts.
This will not change anything, only save a list of all pages that need to be renamed. The script assumes there is either a letter or number in all the pages that needs to be changed.
Step 2: Put that template on top of the pages
Run this line:
python add_text.py -up -text:"{{page_rename|{{subst:PAGENAME}}}}" -file:Pagestoberenamed.txt
Step 3: Creating list for renaming files
Open the file "Pagestoberenamed.txt" in a regex-supporting text editor and use the follow regex replacements:
Replace:
#[[([^:]*):([^]]*)]] with
[[\1:\2]] [[\1 - \2]]
and replace
#[[([^(]*)(([^)]*))([^]]*)]]
with
[[\1(\2)\3]] [[\1\2\3]]
I don't actually have a text editor that supports regex, so instead I copypasted the contents of that file into a sandbox page, and ran the following line:
python replace.py -page:SANDBOX -regex "#[[([^:]*):([^]]*)]]" "[[\1:\2]] [[\1 - \2]]" "#[[([^(]*)(([^)]*))([^]]*)]]" "[[\1(\2)\3]] [[\1\2\3]]"
Save the text as Pagerenaming.txt
Hacky solution, but it should work.
Step 4: Moving the pages
Run this line:
python movepages.py -pairs:Pagerenaming.txt
It will not prompt you, it will move the pages as specified in Pagerenaming.txt If you do not want to have redirects from the old page names, use -noredirect as an additional argument. This may depend on how your wiki is set up, I know Wikipedias didn't have this option until relatively recently (and maybe it is only for administrators now).
Step 5: Fixing links Links can be fixed using this line:
python replace.py -regex "[[([^:]*):([^]]*)]]" "[[\1 - \2|\1: \2]]" "[[([^(|^[]*)(([^)]*))([^]]*)]]" "[[\1\2\3|\1(\2)\3]]" -start:!
If you think it is too slow, you can append -pt:1 to that.
With this last one you should be careful, and approve quite a few changes manually first (pressing "y" and not "a"), in case something is fishy with the regex.
Hope this helps.
2012/1/24 Eric K ek79501@yahoo.com
- A category needs to be added for all redirects that have a certain
regex (its the same as above: ".*[^\w\s-].*" )
The problem I'm facing with the add text.py bot, is that it doesnt add text to redirects (hence, it cannot add a category to a redirect). When it gets to a redirect, it automatically skips it. I might be wrong in my findings but this is what I ran into.
Eric, feel free to modify scripts to your own needs. As you wrote, you are famailiar with programming, and Python is an easy-to-understand language. Open add_text.py for editing, and search "redir". You will find it at one place:
except pywikibot.IsRedirectPage: pywikibot.output(u"%s is a redirect, skip!" % page.title()) return (False, False, always) # continue
Delete these three lines, and try to run it again. I am not 100% sure this is enough (not tested) but I hope so. Modifying teh code is often easier and takes the fraction of time than to run numerous experiments with the "legal" parameters. You may want to make a backup, but the original is always available from the net. Note that Pywiki scripts use Unix-style linebrekas, so if you have Windos, you will have to use a proper editor instead of notepad (e.g. Notepad++) or change the newlines to CRLF.
Thanks! Yes I did see that code but didn't know what to do.
I'll try removing it and see how it goes.
________________________________ From: Bináris wikiposta@gmail.com To: Eric K ek79501@yahoo.com; Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Sent: Tuesday, January 24, 2012 1:52 AM Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
2012/1/24 Eric K ek79501@yahoo.com
2. A category needs to be added for all redirects that have a certain regex (its the same as above: ".*[^\w\s-].*" ) The problem I'm facing with the add text.py bot, is that it doesnt add text to redirects (hence, it cannot add a category to a redirect). When it gets to a redirect, it automatically skips it. I might be wrong in my findings but this is what I ran into.
Eric, feel free to modify scripts to your own needs. As you wrote, you are famailiar with programming, and Python is an easy-to-understand language. Open add_text.py for editing, and search "redir". You will find it at one place:
except pywikibot.IsRedirectPage: pywikibot.output(u"%s is a redirect, skip!" % page.title()) return (False, False, always) # continue
Delete these three lines, and try to run it again. I am not 100% sure this is enough (not tested) but I hope so. Modifying teh code is often easier and takes the fraction of time than to run numerous experiments with the "legal" parameters. You may want to make a backup, but the original is always available from the net. Note that Pywiki scripts use Unix-style linebrekas, so if you have Windos, you will have to use a proper editor instead of notepad (e.g. Notepad++) or change the newlines to CRLF.
2012/1/24 Eric K ek79501@yahoo.com
Thanks! Yes I did see that code but didn't know what to do. I'll try removing it and see how it goes.
Won't be enough. A few lines above it there is a text = page.get(). Default for get is get_redirect=False. So you have to modify this line, too: text = page.get(get_redirect=True)
pywikipedia-l@lists.wikimedia.org