Re: [Pywikipedia-l] problem with fixes.py

28 Jun 2009

2009/6/18 Hannes Röst &lt;hannesroest(a)gmx.ch&gt;ch>:
...
  the problem is here:
 (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),

 It seems to be the case that \b does not work with the German eszett,
 whereas \< does work in my case. Should this be changed in all cases
 where \b is used? Do you have other suggestions? 
Hello!

...
 >> import re
>> t = u'Großdeutschen Reich sdfsfasff deutschen Reich'
>> re.findall(r'(\bdeutsche[rn]? Reich\b)', t) [u'deutschen
Reich', u'deutschen Reich']
...
 >> re.findall(r'(?u)\bdeutsche[rn]?
Reich\b', t) [u'deutschen Reich']
...
 >> re.findall(r'\bdeutsche[rn]? Reich\b',
t, re.U) [u'deutschen Reich']

In other words, you just have to specify that you want the match to
take into account Unicode Locale...

(?u) anywhere in the regex, or compile with re.U flag :)

Regards,
-- 
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] problem with fixes.py