That is wonderful, thank you!

On Mon, Jun 30, 2008 at 3:11 PM, Daniel Herding <DHerding@gmx.de> wrote:
Am Sonntag 29 Juni 2008 17:51:18 schrieb Chris Watkins:
> I am running replace.py and have several questions.
> * Can I replace only the first result per page?

That is possible with some regular expression magic:

python replace.py -regex "(?s)foo(.*$)" "bar\\1" -page:Fubar

Maybe you have to type \1 instead of \\1 if you're using Windows.

> * Can I exclude hits within a wikilink, url or header?  E.g. can I look for
> appropriate technology but ensure it's not inside a wikilink, e.g.
>     * [[Peter's appropriate technology lamp]] or
>     * [[Wikipedia:Appropriate technology|appropriate technology stuff at
> Wikipedia]]) or
>     * http://forum.permaculture.org (if I'm looking for "permaculture").
>     * == Permaculture program ==

You can run this:
python replace.py foo bar -page:Fubar
       -exceptinsidetag:link -exceptinsidetag:hyperlink

This will exclude wikilinks and URLs. There are some more things that can be
excluded, see the source code of the method replaceExcept() in wikipedia.py
(look at the exceptionRegexes dictionary). I have just added a regular
expression for section headers for you, so if you're running the SVN version,
you can use this parameter:

-exceptinsidetag:header


Cheers

Daniel

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l



--
Chris Watkins (a.k.a. Chriswaterguy)

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

Blog: chriswaterguy.livejournal.com/

Buying at Amazon, eBay etc? Start at http://appropedia.maatiam.com and a percentage of your purchase supports Appropedia - at no extra cost.

Where men are the most sure and arrogant, they are commonly the most mistaken, and have there given reins to passion, without that proper deliberation and suspense, which can alone secure them from the grossest absurdities. -- David Hume