[Pywikipedia-l] problem with fixes.py

Nicolas Dumazet nicdumz at gmail.com
Sat Jun 27 23:41:59 UTC 2009


2009/6/18 Hannes Röst <hannesroest at gmx.ch>:
> the problem is here:
> (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
>
> It seems to be the case that \b does not work with the German eszett,
> whereas \< does work in my case. Should this be changed in all cases
> where \b is used? Do you have other suggestions?

Hello!

>>> import re
>>> t = u'Großdeutschen Reich sdfsfasff deutschen Reich'
>>> re.findall(r'(\bdeutsche[rn]? Reich\b)', t)
[u'deutschen Reich', u'deutschen Reich']
>>> re.findall(r'(?u)\bdeutsche[rn]? Reich\b', t)
[u'deutschen Reich']
>>> re.findall(r'\bdeutsche[rn]? Reich\b', t, re.U)
[u'deutschen Reich']

In other words, you just have to specify that you want the match to
take into account Unicode Locale...

(?u) anywhere in the regex, or compile with re.U flag :)

Regards,
-- 
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]



More information about the Pywikipedia-l mailing list