[For others: trying to list all page titles that contain any characters, *other than*: alphanumeric, spaces, underscores, dashes] I'm seeing the same. I went to this regex chat and asked them for a regex, which worked on the regex tester http://regexpal.com/but not on the bot :-(. python pagegenerators.py -titleregex:.*[^\w\s-].* ... - [1] I also tried these: (?=.*[^\w\s-])(.*[\w\s-].*) (?=.*[^\w\s-]).*
#1 hits all pages including ones that only have that set of characters, for example its also showing up "Apple".
I want the bot to show me pages with the following titles: My apple is sweet (song) He said "hello" to me
But it should not show: My apple is sweet - song He said hello to me
________________________________ From: Jon Harald Søby jhsoby@gmail.com To: Eric K ek79501@yahoo.com Sent: Sunday, January 15, 2012 1:50 PM Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
I'm trying to figure this one out too... I've been using mostly the same regex: [^A-Za-z0-9-\s]*, but mine hits _every_ page no matter what. Something's fishy. Will look more into it.
Den 19:49 15. januar 2012 skrev Eric K ek79501@yahoo.com følgende:
I'm trying this for example:
python pagegenerators.py -titleregex:[^A-Za-z0-9-\s]+ (trying to make it say: dont match any alphabets, numbers, spaces and dashes) And it only brings up titles with begin a " (quotation mark) , but it misses titles that have the " somewhere in the middle.
Then I remove the ^ for the regex and see what that does, and it gets all titles including those which have " and so on. python pagegenerators.py -titleregex:[A-Za-z0-9-\s]+
Its probably my regex that is flawed :).
From: Eric K ek79501@yahoo.com To: Jon Harald Søby jhsoby@gmail.com Sent: Sunday, January 15, 2012 1:40 AM
Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
Thanks, I'm learning! I tried step one for generating the title list. It was saying "incomplete XML data". It was doing a text replace and I was able to make it work by doing "-debug" but it was slow. Then I found out this command: python pagegenerators.py -titleregex:apple http://www.mediawiki.org/wiki/Manual:Pywikipediabot/pagegenerators.py (there is no -save option for this script but if I add "> filelist.txt" at the end, it outputs the screen output to that text file, which works)
So this one is only working on the page titles (so its quicker) and it does work, e.g. for the above, it will list any pages beginning with the word "apple". I actually found there's other characters in the database, so the best way would be to do this kind of search:
- If a page contains any character which is not:
--- alphanumeric
--- underscore
--- dash
--- space
Then include that page in the list. So "Hello 123" would be excluded but "Hello 123$" would be included. The pagegenerators.py does not have an "exclude" option like the "replace.py" had.
Do you know of a regex that will work?
From: Jon Harald Søby jhsoby@gmail.com To: Eric K ek79501@yahoo.com Sent: Saturday, January 14, 2012 10:09 PM Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
Good luck! :-) My regex skills are quite rudimentary, so be cautious when doing the replacement step 5 -- the regexes may catch something they shouldn't. Please let me know how it goes! :-)
Den 04:43 15. januar 2012 skrev Eric K ek79501@yahoo.com følgende:
Hi John, wow, that really is awesome, that you were able to do this with the provided scripts. I could never have come up with that. I'll try this right away and let you know how it goes.
From: Jon Harald Søby jhsoby@gmail.com To: Eric K ek79501@yahoo.com; Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Sent: Saturday, January 14, 2012 8:47 PM Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
2012/1/15 Eric K ek79501@yahoo.com
Hi guys,
I just installed the pywikipedia bot on my wiki yesterday. I'm new to Python but I can try learn it since I'm familiar with PHP. It would take me a while though to make this first bot since I'm new to the language. The tasks are pretty straightforward. I would like the bot to run without any user input and do all of this by itself:
- For every page on the wiki, check if it has these three characters: ( , ) , : . Any page containing any of these characters (curly brackets and colon) will be moved to a new title. The original title is var_1.
- For the new title, brackets are simply deleted, and the : (colon) is replaced with a " - " (a dash with a space on each side). The new title generated is var_2.
- Insert this text at the top of this page: {{page_rename|var_1}}, and save page.
- Find any existing links on the site to this page which would be in the format of [[var_1]], and change them to [[var_2|var_1]].
I don't need any menus or other functionality. Is something something pretty straightforward to make? I would appreciate any tips/help and if its something that can be made pretty easily, I would be really thankful if someone could do this for me or give me a good start. I've looked at some of existing pywikipedia bot scripts (basic.py, movepages.py) but none of them would work for me and being new to Python, it would take me a long time to do what I need but in any case I will learn a lot in this first attempt.
thanksEric _______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
This is how I would do it. It is probably a hacky solution, and there may be better/more efficient ways of doing it, but it should work.
Step 1: Getting list of pages to change
Run this line:
python replace.py -regex -requiretitle:"(|)|:" "[A-Za-z0-9]" "test" -save:Pagestoberenamed.txt -start:!
Press "a" when it prompts.
This will not change anything, only save a list of all pages that
need to be renamed. The script assumes there is either a letter or number in all the pages that needs to be changed.
Step 2: Put that template on top of the pages
Run this line:
python add_text.py -up -text:"{{page_rename|{{subst:PAGENAME}}}}" -file:Pagestoberenamed.txt
Step 3: Creating list for renaming files
Open the file "Pagestoberenamed.txt" in a regex-supporting text editor and use the follow regex replacements:
Replace:
#[[([^:]*):([^]]*)]] with
[[\1:\2]] [[\1 - \2]]
and replace
#[[([^(]*)(([^)]*))([^]]*)]]
with
[[\1(\2)\3]] [[\1\2\3]]
I don't actually have a text editor that supports regex, so instead I copypasted the contents of that file into a sandbox page, and ran the following line:
python replace.py -page:SANDBOX -regex
"#[[([^:]*):([^]]*)]]" "[[\1:\2]] [[\1 - \2]]" "#[[([^(]*)(([^)]*))([^]]*)]]" "[[\1(\2)\3]] [[\1\2\3]]"
Save the text as Pagerenaming.txt
Hacky solution, but it should work.
Step 4: Moving the pages
Run this line:
python movepages.py -pairs:Pagerenaming.txt
It will not prompt you, it will move the pages as specified in Pagerenaming.txt If you do not want to have redirects from the old page names, use
-noredirect as an additional argument. This may depend on how your wiki is set up, I know Wikipedias didn't have this option until relatively recently (and maybe it is only for administrators now).
Step 5: Fixing links Links can be fixed using this line:
python replace.py -regex
"[[([^:]*):([^]]*)]]" "[[\1 - \2|\1: \2]]" "[[([^(|^[]*)(([^)]*))([^]]*)]]" "[[\1\2\3|\1(\2)\3]]" -start:!
If you think it is too slow, you can append -pt:1 to that.
With this last one you should be careful, and approve quite a few changes manually first (pressing "y" and not "a"), in case something is fishy with the regex.
Hope this helps.
--
mvh Jon Harald Søby
--
mvh Jon Harald Søby
2012/1/15 Eric K ek79501@yahoo.com
I'm seeing the same. I went to this regex chat and asked them for a regex, which worked on the regex tester http://regexpal.com/ but not on the bot :-(.
Note that you need to escape several characters to work correctly from the shell. This means that -titleregex:.*[^\w\s-].* should rather be something like -titleregex:".*[^\w\s-].*"
You can test this using echo, to make sure the value passed to pywikipedia is the correct one. valhallasw@dorthonion:~/src/pywikipedia/trunk$ echo ".*[^\w\s-].*" .*[^\w\s-].* valhallasw@dorthonion:~/src/pywikipedia/trunk$ echo .*[^\w\s-].* .*[^ws-].*
Best, Merlijn
Thank you!! It worked.
________________________________ From: Merlijn van Deen valhallasw@arctus.nl To: Eric K ek79501@yahoo.com; Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Cc: Jon Harald Søby jhsoby@gmail.com Sent: Sunday, January 15, 2012 3:09 PM Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update links
2012/1/15 Eric K ek79501@yahoo.com
I'm seeing the same. I went to this regex chat and asked them for a regex, which worked on the regex tester http://regexpal.com/ but not on the bot :-(.
Note that you need to escape several characters to work correctly from the shell. This means that -titleregex:.*[^\w\s-].* should rather be something like -titleregex:".*[^\w\s-].*"
You can test this using echo, to make sure the value passed to pywikipedia is the correct one. valhallasw@dorthonion:~/src/pywikipedia/trunk$ echo ".*[^\w\s-].*" .*[^\w\s-].* valhallasw@dorthonion:~/src/pywikipedia/trunk$ echo .*[^\w\s-].* .*[^ws-].*
Best, Merlijn
2012/1/15 Eric K ek79501@yahoo.com
[For others: trying to list all page titles that contain any characters, *other than*: alphanumeric, spaces, underscores, dashes]
I'm seeing the same. I went to this regex chat and asked them for a regex, which worked on the regex tester http://regexpal.com/ but not on the bot :-(.
python pagegenerators.py -titleregex:.*[^\w\s-].* ... - [1] #1 hits all pages including ones that only have that set of characters, for example its also showing up "Apple".
This works for me, adding the quotation marks: -titleregex:".*[^\w\s-].*" But it will also display titles that have non-English letters in. Note that \w will not match them.