Thought I'd point out a couple of useful things I've come across when doing regex work (in Python, but also in other languages):

1: The re.VERBOSE flag.  Lets you write your regular expressions using multiline strings (you'll have to escape whitespace, or use \s though), and also add comments.  Makes it a lot easier to understand what you've been thinking when you come back to your code two months later to change it.

2: Using functions instead of strings as the replacement in sub().  If you're looking to do a fair amount of conditional logic in your replacement, it might be more easily written by having a function do it, rather than attempt to do it all with a regex.

My $.02.


Cheers,
Morten

On Tue, Jun 28, 2011 at 7:23 AM, Bináris <wikiposta@gmail.com> wrote:
OK, then I make separate lines. The only issue is that any enhacement/correction will be more complicated this way (that is another reason to put as many features in one line as possible).


2011/6/28 Marcin Cieslak <saper@saper.info>

Given the speed of fetching/storing pages I don't think that speed of the
regular expression makes any difference. Running two compiled RE's
one after the other in sequence on the page text should be very fast.



--
Bináris

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l