Next question: If I have a set of synonyms in the wiki, and they redirect to the same page (say, cow and cattle), is it possible to find the first time any of these terms occur on a page, and link only that? So cow would be changed to [[cow]], or cattle to [[cattle]], but not both on the same page, if both terms occur.<br>
<br><br>Many thanks Daniel, this has been a great help - and adding the
-exceptinsidetag:header to regex thanks to my request is very good of you.<br><br>I have been doing test runs (without saving, as my bot has a problem at the moment) and I am able to successfully use all these terms that you gave. This is wonderful.<br>
<br>Chris<br><br><br><div class="gmail_quote">On Mon, Jun 30, 2008 at 3:11 PM, Daniel Herding <<a href="mailto:DHerding@gmx.de">DHerding@gmx.de</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Am Sonntag 29 Juni 2008 17:51:18 schrieb Chris Watkins:<br>
<div class="Ih2E3d">> I am running replace.py and have several questions.<br>
</div><div class="Ih2E3d">> * Can I replace only the first result per page?<br>
<br>
</div>That is possible with some regular expression magic:<br>
<br>
python replace.py -regex "(?s)foo(.*$)" "bar\\1" -page:Fubar<br>
<br>
Maybe you have to type \1 instead of \\1 if you're using Windows.<br>
<div class="Ih2E3d"><br>
> * Can I exclude hits within a wikilink, url or header? E.g. can I look for<br>
> appropriate technology but ensure it's not inside a wikilink, e.g.<br>
> * [[Peter's appropriate technology lamp]] or<br>
> * [[Wikipedia:Appropriate technology|appropriate technology stuff at<br>
> Wikipedia]]) or<br>
> * <a href="http://forum.permaculture.org" target="_blank">http://forum.permaculture.org</a> (if I'm looking for "permaculture").<br>
> * == Permaculture program ==<br>
<br>
</div>You can run this:<br>
python replace.py foo bar -page:Fubar<br>
-exceptinsidetag:link -exceptinsidetag:hyperlink<br>
<br>
This will exclude wikilinks and URLs. There are some more things that can be<br>
excluded, see the source code of the method replaceExcept() in wikipedia.py<br>
(look at the exceptionRegexes dictionary). I have just added a regular<br>
expression for section headers for you, so if you're running the SVN version,<br>
you can use this parameter:<br>
<br>
-exceptinsidetag:header<br>
<br>
<br>
Cheers<br>
<br>
Daniel<br>
<br>
_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org">Pywikipedia-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br>
</blockquote></div><br><br clear="all"><br>-- <br>Chris Watkins (a.k.a. Chriswaterguy)<br><br>Appropedia.org - Sharing knowledge to build rich, sustainable lives.<br><br>Blog: <a href="http://chriswaterguy.livejournal.com/">chriswaterguy.livejournal.com/</a><br>
<br>Buying at Amazon, eBay etc? Start at <a href="http://appropedia.maatiam.com">http://appropedia.maatiam.com</a> and a percentage of your purchase supports Appropedia - at no extra cost.<br><br>Where men are the most sure and arrogant, they are commonly the most mistaken, and have there given reins to passion, without that proper deliberation and suspense, which can alone secure them from the grossest absurdities. -- David Hume