[Pywikipedia-l] problem with fixes.py
Francesco Cosoleto
cosoleto at gmail.com
Fri Jun 19 09:01:34 UTC 2009
Hannes Röst ha scritto:
> Hello
>
> I am writing for the first time and I don't quite know where the
> appropriate place is to write this. I am working on the German
Originally this mailing-list was named "pywikipediabot-users", nowadays
it looks more as a devel mailing-list.
> wikipedia and I ran into some problems using fixes.py, specifically I
> had this edit: http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&diff=prev&oldid=61255346
>
> the problem is here:
> (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
>
> It seems to be the case that \b does not work with the German eszett,
> whereas \< does work in my case. Should this be changed in all cases
> where \b is used? Do you have other suggestions?
I am surprised to see that. I guess that is because German eszett may be
used in a different context. I am not sure it worth a bug report to
Python, others software (like grep) don't work using this regexp either.
A possible workaround should be this:
ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
--
Francesco Cosoleto
"Dunque nessuno indietro
si volti, verso le navi, dopo che ha udito l'appello,
ma andate avanti, l'un l'altro incitatevi,
se mai l'Olimpio Zeus, che il fulmine avventa, ci voglia concedere
di rintuzzare l'assalto, di ricacciare i nemici in città". (Omero)
More information about the Pywikipedia-l
mailing list