*Context: * Thanks to the helpful people on this list, I've now got a replace.py bot which successfully adds wikilinks to key terms. E.g. to wikilink "sustainability", the command I'm using is (from the CLI, within the Pywikipediabot directory):
python replace.py -regex "(?s)sustainability(.*$)" "[[sustainability]]\1" -xml:currentdump.xml -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -namespace:0 -namespace:4 -namespace:102
This code finds the first occurrence of the term sustainability that is not wikilinked, and replaces it with [[sustainability]]. (I don't understand the regex stuff, but I can copy and paste.)
*Question: * If the first occurrence of the term sustainability is already wikilinked, it goes on to wikilink the second occurrence. I actually only want the first term linked, so I would prefer that it skips the page in this case.
Any ideas?
Thanks!
bump.
Is it possible to do it this way?:
1. Recognize the pattern where the term is already wikilinked, and then skip that page; and then 2. If the wikilinked term doesn't already exist, *then* perform the search and replace operation.
If so, how could I do that? Do I need a different bot from replace.py?
Thanks, Chris
On Fri, Aug 15, 2008 at 20:12, Chris Watkins chriswaterguy@appropedia.orgwrote:
*Context:
Thanks to the helpful people on this list, I've now got a replace.py bot which successfully adds wikilinks to key terms. E.g. to wikilink "sustainability", the command I'm using is (from the CLI, within the Pywikipediabot directory):
python replace.py -regex "(?s)sustainability(.*$)" "[[sustainability]]\1" -xml:currentdump.xml -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -namespace:0 -namespace:4 -namespace:102
This code finds the first occurrence of the term sustainability that is not wikilinked, and replaces it with [[sustainability]]. (I don't understand the regex stuff, but I can copy and paste.)
*Question:
If the first occurrence of the term sustainability is already wikilinked, it goes on to wikilink the second occurrence. I actually only want the first term linked, so I would prefer that it skips the page in this case.
Any ideas?
Thanks!
-- Chris Watkins (a.k.a. Chriswaterguy)
My email inbox is oh so full, so don't be offended if my emails are short & to the point :-).
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
Blog: chriswaterguy.livejournal.com/
Buying at Amazon, eBay etc? Start at http://appropedia.maatiam.com and a percentage of your purchase supports Appropedia - at no extra cost.
Hi Chris,
Am Freitag 22 August 2008 08:25:09 schrieb Chris Watkins:
Is it possible to do it this way?:
- Recognize the pattern where the term is already wikilinked, and then
skip that page; and then 2. If the wikilinked term doesn't already exist, *then* perform the search and replace operation.
try the -excepttext parameter:
replace.py ... -regex -excepttext:[[sustainability[]|]
This will skip all pages that contain "[[sustainability]" or "[[sustainability|". You might also want to handle the capital letter S:
replace.py ... -regex -excepttext:[[[sS]ustainability[]|]
I haven't tested all this, but it should be a good start for you to experiment with. If you run into trouble, just write to this mailing list again.
Cheers
Daniel
pywikipedia-l@lists.wikimedia.org