I want to do a search and replace on FOO1, except where it occurs within the text of an external link, e.g. [https://www.blah.bar/ blah FOO1 more text].
I was using -exceptinsidetag:hyperlink but eventually realized it only excludes hits within the actual url.
I'm guessing I need to use exceptinside, but not sure how. Can someone give me some guidance?
I'm already using -regex. The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)" "[[\1]]\2" -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref -excepttext:"(?si)[[((?:FOO1|FOO2)[|]])" -namespace:0 -namespace:102 -namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square brackets to: FOO1|FOO2." -log -xml:currentdump.xml
(Not that I understand regex much - I had lots of help getting there.)
Many thanks for any help!
Try this: 'inside': [ r'[http.*?]', ],
I tested it in fixes.py, but it should work also in command line as "[http.*?]". This excludes the URL _and_ the text of external links as well as internal links within [[ ]] which begin with http. :-)