2009/6/18 Hannes Röst hannesroest@gmx.ch:
the problem is here: (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett, whereas < does work in my case. Should this be changed in all cases where \b is used? Do you have other suggestions?
Hello!
import re t = u'Großdeutschen Reich sdfsfasff deutschen Reich' re.findall(r'(\bdeutsche[rn]? Reich\b)', t)
[u'deutschen Reich', u'deutschen Reich']
re.findall(r'(?u)\bdeutsche[rn]? Reich\b', t)
[u'deutschen Reich']
re.findall(r'\bdeutsche[rn]? Reich\b', t, re.U)
[u'deutschen Reich']
In other words, you just have to specify that you want the match to take into account Unicode Locale...
(?u) anywhere in the regex, or compile with re.U flag :)
Regards,