I just worked it out, mostly... instead of:
-exceptinsidetag:header
I used: -exceptinside:'=[^\n\r]*=[ \t]*'
And it worked!
There might be a small risk of false positives, so I tried various
tweaks, e.g. -exceptinside:'^=[^\n\r]*=[ \t]*$'
-exceptinside:'[\n\r]=[^\n\r]*=[ \t]*[\n\r]'
-exceptinside:'[\n\r]=[^\n\r]*='
But none worked... any suggestions?
On Thu, Dec 22, 2011 at 18:21, Chris Watkins
<chriswaterguy(a)appropedia.org
<mailto:chriswaterguy@appropedia.org>> wrote:
I have been using " -exceptinsidetag:header" with replace.py. This
was added by Daniel Herding in response to a request by me:
On Mon, Jun 30, 2008 at 23:11, Daniel Herding <DHerding(a)gmx.de
<mailto:DHerding@gmx.de>> wrote:
This will exclude wikilinks and URLs. There are some more things
that can be excluded, see the source code of the method
replaceExcept() in wikipedia.py (look at the exceptionRegexes
dictionary). I have just added a regular expression for section
headers for you, so if you're running the SVN version, you can use
this parameter:
-exceptinsidetag:header
I seem to recall this working in a nightly version a couple of
years ago, but it's not working now - I'm not sure when it stopped.
Is it possible to put it back in?
Thanks!
-- Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable
lives.
-- Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable
lives.
_______________________________________________ Pywikipedia-l
mailing list Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org/
iEYEARECAAYFAk71ni0ACgkQAXWvBxzBrDBEKQCgwDB6gNylbEgXPxfld1M7sAhL
9XUAoIhYypqoyM3FzUCNSgJ7bT+6QLoj
=yxc+
-----END PGP SIGNATURE-----
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org