Hi
I created a patch for this problem here: https://sourceforge.net/tracker/?func=detail&aid=2813298&group_id=93... I don't know how to best post the changes in the code, so I will copy it here as well. How should I proceed in the future if I have a change to suggest?
I just changed \b to < which works and the problem only arose at the words starting with "deutsch" so there I replaced it:
--- ../pywikipedia/fixes.py 2009-06-27 19:00:52.000000000 +0200 +++ fixes.py 2009-06-27 19:46:27.000000000 +0200 @@ -293,10 +295,10 @@ }, 'replacements': [ (r'\batlantische(r|n|) Ozean', r'Atlantische\1 Ozean'), - (r'\bdeutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'), - (r'\bdeutschen Bundestags\b', r'Deutschen Bundestags'), # Aufpassen, z. B. 'deutsche Bundestagswahl' - (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'), - (r'\bdeutschen Reichs\b', r'Deutschen Reichs'), # Aufpassen, z. B. 'deutsche Reichsgrenzen' + (r'<deutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'), + (r'<deutschen Bundestags\b', r'Deutschen Bundestags'), # Aufpassen, z. B. 'deutsche Bundestagswahl' + (r'<deutsche(r|n|) Reich\b', r'Deutsche\1 Reich'), #Aufpassen z. B. 'Großdeutsches Reich' + (r'<deutschen Reichs\b', r'Deutschen Reichs'), # Aufpassen, z. B. 'deutsche Reichsgrenzen' (r'\bdritte(n|) Welt(?!krieg)', r'Dritte\1 Welt'), (r'\bdreißigjährige(r|n|) Krieg', r'Dreißigjährige\1 Krieg'), (r'\beuropäische(n|) Gemeinschaft', r'Europäische\1 Gemeinschaft'),
Greetings
Hannes
2009/6/19 Francesco Cosoleto cosoleto@gmail.com:
Hannes Röst ha scritto:
Hello
I am writing for the first time and I don't quite know where the appropriate place is to write this. I am working on the German
Originally this mailing-list was named "pywikipediabot-users", nowadays it looks more as a devel mailing-list.
wikipedia and I ran into some problems using fixes.py, specifically I had this edit: http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&...
the problem is here: (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett, whereas < does work in my case. Should this be changed in all cases where \b is used? Do you have other suggestions?
I am surprised to see that. I guess that is because German eszett may be used in a different context. I am not sure it worth a bug report to Python, others software (like grep) don't work using this regexp either.
A possible workaround should be this:
ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
-- Francesco Cosoleto
"Dunque nessuno indietro si volti, verso le navi, dopo che ha udito l'appello, ma andate avanti, l'un l'altro incitatevi, se mai l'Olimpio Zeus, che il fulmine avventa, ci voglia concedere di rintuzzare l'assalto, di ricacciare i nemici in città". (Omero)
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l