Hello
I am writing for the first time and I don't quite know where the appropriate place is to write this. I am working on the German wikipedia and I ran into some problems using fixes.py, specifically I had this edit: http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&...
the problem is here: (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett, whereas < does work in my case. Should this be changed in all cases where \b is used? Do you have other suggestions?
Greetings
Hannes
2009/6/17 russblau@svn.wikimedia.org:
Revision: 6968 Author: russblau Date: 2009-06-17 11:22:20 +0000 (Wed, 17 Jun 2009)
Log Message:
Fix bugs affecting page deletion
Modified Paths:
branches/rewrite/pywikibot/site.py
Modified: branches/rewrite/pywikibot/site.py
--- branches/rewrite/pywikibot/site.py 2009-06-16 21:18:20 UTC (rev 6967) +++ branches/rewrite/pywikibot/site.py 2009-06-17 11:22:20 UTC (rev 6968) @@ -1115,7 +1115,8 @@ """ query = api.PropertyGenerator("info|revisions", titles=page.title(withSection=False),
- intoken=tokentype)
- intoken=tokentype,
- site=self)
for item in query: if item['title'] != page.title(withSection=False): raise Error( @@ -2452,7 +2453,8 @@ % e.__class__.__name__) if not self.logged_in(sysop=True): raise NoUsername("delete: Unable to login as sysop")
- token = self.token("delete")
- token = self.token(page, "delete")
- self.lock_page(page)
req = api.Request(site=self, action="delete", token=token, title=page.title(withSection=False), reason=summary)
Pywikipedia-svn mailing list Pywikipedia-svn@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-svn
Hannes Röst ha scritto:
Hello
I am writing for the first time and I don't quite know where the appropriate place is to write this. I am working on the German
Originally this mailing-list was named "pywikipediabot-users", nowadays it looks more as a devel mailing-list.
wikipedia and I ran into some problems using fixes.py, specifically I had this edit: http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&...
the problem is here: (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett, whereas < does work in my case. Should this be changed in all cases where \b is used? Do you have other suggestions?
I am surprised to see that. I guess that is because German eszett may be used in a different context. I am not sure it worth a bug report to Python, others software (like grep) don't work using this regexp either.
A possible workaround should be this:
ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
Hi
I created a patch for this problem here: https://sourceforge.net/tracker/?func=detail&aid=2813298&group_id=93... I don't know how to best post the changes in the code, so I will copy it here as well. How should I proceed in the future if I have a change to suggest?
I just changed \b to < which works and the problem only arose at the words starting with "deutsch" so there I replaced it:
--- ../pywikipedia/fixes.py 2009-06-27 19:00:52.000000000 +0200 +++ fixes.py 2009-06-27 19:46:27.000000000 +0200 @@ -293,10 +295,10 @@ }, 'replacements': [ (r'\batlantische(r|n|) Ozean', r'Atlantische\1 Ozean'), - (r'\bdeutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'), - (r'\bdeutschen Bundestags\b', r'Deutschen Bundestags'), # Aufpassen, z. B. 'deutsche Bundestagswahl' - (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'), - (r'\bdeutschen Reichs\b', r'Deutschen Reichs'), # Aufpassen, z. B. 'deutsche Reichsgrenzen' + (r'<deutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'), + (r'<deutschen Bundestags\b', r'Deutschen Bundestags'), # Aufpassen, z. B. 'deutsche Bundestagswahl' + (r'<deutsche(r|n|) Reich\b', r'Deutsche\1 Reich'), #Aufpassen z. B. 'Großdeutsches Reich' + (r'<deutschen Reichs\b', r'Deutschen Reichs'), # Aufpassen, z. B. 'deutsche Reichsgrenzen' (r'\bdritte(n|) Welt(?!krieg)', r'Dritte\1 Welt'), (r'\bdreißigjährige(r|n|) Krieg', r'Dreißigjährige\1 Krieg'), (r'\beuropäische(n|) Gemeinschaft', r'Europäische\1 Gemeinschaft'),
Greetings
Hannes
2009/6/19 Francesco Cosoleto cosoleto@gmail.com:
Hannes Röst ha scritto:
Hello
I am writing for the first time and I don't quite know where the appropriate place is to write this. I am working on the German
Originally this mailing-list was named "pywikipediabot-users", nowadays it looks more as a devel mailing-list.
wikipedia and I ran into some problems using fixes.py, specifically I had this edit: http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&...
the problem is here: (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett, whereas < does work in my case. Should this be changed in all cases where \b is used? Do you have other suggestions?
I am surprised to see that. I guess that is because German eszett may be used in a different context. I am not sure it worth a bug report to Python, others software (like grep) don't work using this regexp either.
A possible workaround should be this:
ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
-- Francesco Cosoleto
"Dunque nessuno indietro si volti, verso le navi, dopo che ha udito l'appello, ma andate avanti, l'un l'altro incitatevi, se mai l'Olimpio Zeus, che il fulmine avventa, ci voglia concedere di rintuzzare l'assalto, di ricacciare i nemici in città". (Omero)
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Hannes Röst ha scritto:
Hi
I created a patch for this problem here:
[...]
...I think it's a conspiracy. ^^
Stop with the jokes and start with reading my texts, please. LOL.
2009/6/19 Francesco Cosoleto cosoleto@gmail.com:
[...]
A possible workaround should be this:
ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
ur'(?<!\xdf)\bdeutsche(r|n|) Reich\b'
[...]
Please, let me explain why something has slipped to me and I was talking about bug... :-/ Unlike findall() or search() or match() methods, sub() hasn't flag parameter, but I was just putting re.UNICODE and re.LOCALE flags there anyway... Besides, replace.py is supposed to compile all the regexps with the re.UNICODE flag by default.
Mmm... Probably this is the last my email to mailing-list. I am going to to hide my self for shame.
Hello
Thanks a lot everybody for your time. I was also trying re.findall(r'\bdeutsche[rn]? Reich\b', t, re.LOCALE) and that did not work but with re.U it works. So not changes are needed after all.
Greetings
Hannes
2009/6/28 Francesco Cosoleto cosoleto@gmail.com:
[...]
Please, let me explain why something has slipped to me and I was talking about bug... :-/ Unlike findall() or search() or match() methods, sub() hasn't flag parameter, but I was just putting re.UNICODE and re.LOCALE flags there anyway... Besides, replace.py is supposed to compile all the regexps with the re.UNICODE flag by default.
Mmm... Probably this is the last my email to mailing-list. I am going to to hide my self for shame.
-- Francesco Cosoleto
«Nessuno, eccetto il teorico stesso, crede nelle sue teorie; tutti credono ai risultati di laboratorio, eccetto lo sperimentatore». (Albert Einstein)
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
2009/6/18 Hannes Röst hannesroest@gmx.ch:
the problem is here: (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
It seems to be the case that \b does not work with the German eszett, whereas < does work in my case. Should this be changed in all cases where \b is used? Do you have other suggestions?
Hello!
import re t = u'Großdeutschen Reich sdfsfasff deutschen Reich' re.findall(r'(\bdeutsche[rn]? Reich\b)', t)
[u'deutschen Reich', u'deutschen Reich']
re.findall(r'(?u)\bdeutsche[rn]? Reich\b', t)
[u'deutschen Reich']
re.findall(r'\bdeutsche[rn]? Reich\b', t, re.U)
[u'deutschen Reich']
In other words, you just have to specify that you want the match to take into account Unicode Locale...
(?u) anywhere in the regex, or compile with re.U flag :)
Regards,