I just worked it out, mostly... instead of:
-exceptinsidetag:header
I used:
-exceptinside:'=[^\n\r]*=[ \t]*'
And it worked!
There might be a small risk of false positives, so I tried various tweaks,
e.g.
-exceptinside:'^=[^\n\r]*=[ \t]*$'
-exceptinside:'[\n\r]=[^\n\r]*=[ \t]*[\n\r]'
-exceptinside:'[\n\r]=[^\n\r]*='
But none worked... any suggestions?
On Thu, Dec 22, 2011 at 18:21, Chris Watkins
<chriswaterguy(a)appropedia.org>wrote;wrote:
I have been using " -exceptinsidetag:header"
with replace.py. This was
added by Daniel Herding in response to a request by me:
On Mon, Jun 30, 2008 at 23:11, Daniel Herding <DHerding(a)gmx.de> wrote:
This will exclude wikilinks and URLs. There are some more things that can
be
excluded, see the source code of the method replaceExcept() in
wikipedia.py
(look at the exceptionRegexes dictionary). I have just added a regular
expression for section headers for you, so if you're running the SVN
version,
you can use this parameter:
-exceptinsidetag:header
I seem to recall this working in a nightly version a couple of years ago,
but it's not working now - I'm not sure when it stopped. Is it possible to
put it back in?
Thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.