Hi
I created a patch for this problem here:
https://sourceforge.net/tracker/?func=detail&aid=2813298&group_id=9…
I don't know how to best post the changes in the code, so I will copy
it here as well. How should I proceed in the future if I have a change
to suggest?
I just changed \b to \< which works and the problem only arose at the
words starting with "deutsch" so there I replaced it:
--- ../pywikipedia/fixes.py 2009-06-27 19:00:52.000000000 +0200
+++ fixes.py 2009-06-27 19:46:27.000000000 +0200
@@ -293,10 +295,10 @@
},
'replacements': [
(r'\batlantische(r|n|) Ozean', r'Atlantische\1 Ozean'),
- (r'\bdeutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'),
- (r'\bdeutschen Bundestags\b', r'Deutschen Bundestags'), #
Aufpassen, z. B. 'deutsche Bundestagswahl'
- (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
- (r'\bdeutschen Reichs\b', r'Deutschen Reichs'), #
Aufpassen, z. B. 'deutsche Reichsgrenzen'
+ (r'\<deutsche(r|n|) Bundestag\b', r'Deutsche\1
Bundestag'),
+ (r'\<deutschen Bundestags\b', r'Deutschen Bundestags'), #
Aufpassen, z. B. 'deutsche Bundestagswahl'
+ (r'\<deutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
#Aufpassen z. B. 'Großdeutsches Reich'
+ (r'\<deutschen Reichs\b', r'Deutschen Reichs'), #
Aufpassen, z. B. 'deutsche Reichsgrenzen'
(r'\bdritte(n|) Welt(?!krieg)', r'Dritte\1 Welt'),
(r'\bdreißigjährige(r|n|) Krieg', r'Dreißigjährige\1
Krieg'),
(r'\beuropäische(n|) Gemeinschaft', r'Europäische\1
Gemeinschaft'),
Greetings
Hannes
2009/6/19 Francesco Cosoleto <cosoleto(a)gmail.com>om>:
Hannes Röst ha scritto:
Hello
I am writing for the first time and I don't quite know where the
appropriate place is to write this. I am working on the German
Originally this mailing-list was named "pywikipediabot-users", nowadays
it looks more as a devel mailing-list.
wikipedia and I ran into some problems using
fixes.py, specifically I
had this edit:
http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&…
the problem is here:
(r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett,
whereas \< does work in my case. Should this be changed in all cases
where \b is used? Do you have other suggestions?
I am surprised to see that. I guess that is because German eszett may be
used in a different context. I am not sure it worth a bug report to
Python, others software (like grep) don't work using this regexp either.
A possible workaround should be this:
ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
--
Francesco Cosoleto
"Dunque nessuno indietro
si volti, verso le navi, dopo che ha udito l'appello,
ma andate avanti, l'un l'altro incitatevi,
se mai l'Olimpio Zeus, che il fulmine avventa, ci voglia concedere
di rintuzzare l'assalto, di ricacciare i nemici in città". (Omero)
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l