Bugs item #2079760, was opened at 2008-08-27 23:30 Message generated for change (Comment added) made by jeremybaron You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2079760...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mikko Silvonen (silvonen) Assigned to: Nobody/Anonymous (nobody) Summary: Periods converted to percent signs in section links
Initial Comment: Why did my interwiki.py edit http://en.wikipedia.org/w/index.php?title=1st_Belorussian_Front&diff=234... convert the link [[de:Zentralfront#1._Wei.C3.9Frussische_Front]] to [[de:Zentralfront#1% Weirussische Front]]?
The correct decoded link would be: [[de:Zentralfront#1. Weirussische Front]]
C:\svn\pywikipedia>python version.py Pywikipedia [http] trunk/pywikipedia (r5854, Aug 27 2008, 21:32:58) Python 2.5.1 (r251:54863, May 1 2007, 17:47:05) [MSC v.1310 32 bit (Intel)]
----------------------------------------------------------------------
Comment By: Jeremy Baron (jeremybaron) Date: 2008-08-28 00:35
Message: Logged In: YES user_id=1669658 Originator: NO
I don't know the anchor encoding MediaWiki uses too well but I think this fixes it. (patch below because I see no obvious way to attach. I know there is a way, maybe I don't have sufficient privs.)
Rudimentary tests (before and after patch application):
In [7]: import wikipedia Checked for running processes. 1 processes currently running, including the current process.
In [8]: sectionlinktests = ('de:Zentralfront#1._Wei.C3.9Frussische_Front','a#.41.29');sectionlinktester = lambda x: wikipedia.Page(wikipedia.getSite(),x).aslink()
In [9]: [(x,sectionlinktester(x)) for x in sectionlinktests] Out[9]: [('de:Zentralfront#1._Wei.C3.9Frussische_Front', u'[[de:Zentralfront#1% Wei\xdfrussische Front]]'), ('a#.41.29', u'[[A#A)]]')]
In [10]: reload(wikipedia) Checked for running processes. 2 processes currently running, including the current process. Out[10]: <module 'wikipedia' from '/Users/jeremy/sandbox/mediawiki/pywikipediabot/pywikipedia/wikipedia.py'>
In [11]: [(x,sectionlinktester(x)) for x in sectionlinktests] Out[11]: [('de:Zentralfront#1._Wei.C3.9Frussische_Front', u'[[de:Zentralfront#1. Wei\xdfrussische Front]]'), ('a#.41.29', u'[[A#A)]]')]
patch: Index: pywikipedia/wikipedia.py =================================================================== --- pywikipedia/wikipedia.py (revision 5855) +++ pywikipedia/wikipedia.py (working copy) @@ -228,6 +228,7 @@ Rwatchlist = re.compile(r"<input tabindex='[\d]+' type='checkbox' " r"name='wpWatchthis' checked='checked'") Rlink = re.compile(r'[[(?P<title>[^]|[]*)(|[^]]*)?]]') +resectiondecode = re.compile(r".(?=[0-9a-f]{2})",re.I)
class Page(object): @@ -526,7 +527,7 @@ """ section = self._section if section and decode: - section = section.replace('.', '%') + section = resectiondecode.sub('%',section) section = url2unicode(section, self._site) if not underscore: section = section.replace('_', ' ')
btw, sourceforge strips all kinds of things out of bugspam, not just german chars :-/
----------------------------------------------------------------------
Comment By: Jeremy Baron (jeremybaron) Date: 2008-08-28 00:34
Message: Logged In: YES user_id=1669658 Originator: NO
I don't know the anchor encoding MediaWiki uses too well but I think this fixes it. (patch below because I see no obvious way to attach. I know there is a way, maybe I don't have sufficient privs.)
Rudimentary tests (before and after patch application):
In [7]: import wikipedia Checked for running processes. 1 processes currently running, including the current process.
In [8]: sectionlinktests = ('de:Zentralfront#1._Wei.C3.9Frussische_Front','a#.41.29');sectionlinktester = lambda x: wikipedia.Page(wikipedia.getSite(),x).aslink()
In [9]: [(x,sectionlinktester(x)) for x in sectionlinktests] Out[9]: [('de:Zentralfront#1._Wei.C3.9Frussische_Front', u'[[de:Zentralfront#1% Wei\xdfrussische Front]]'), ('a#.41.29', u'[[A#A)]]')]
In [10]: reload(wikipedia) Checked for running processes. 2 processes currently running, including the current process. Out[10]: <module 'wikipedia' from '/Users/jeremy/sandbox/mediawiki/pywikipediabot/pywikipedia/wikipedia.py'>
In [11]: [(x,sectionlinktester(x)) for x in sectionlinktests] Out[11]: [('de:Zentralfront#1._Wei.C3.9Frussische_Front', u'[[de:Zentralfront#1. Wei\xdfrussische Front]]'), ('a#.41.29', u'[[A#A)]]')]
patch: Index: pywikipedia/wikipedia.py =================================================================== --- pywikipedia/wikipedia.py (revision 5855) +++ pywikipedia/wikipedia.py (working copy) @@ -228,6 +228,7 @@ Rwatchlist = re.compile(r"<input tabindex='[\d]+' type='checkbox' " r"name='wpWatchthis' checked='checked'") Rlink = re.compile(r'[[(?P<title>[^]|[]*)(|[^]]*)?]]') +resectiondecode = re.compile(r".(?=[0-9a-f]{2})",re.I)
class Page(object): @@ -526,7 +527,7 @@ """ section = self._section if section and decode: - section = section.replace('.', '%') + section = resectiondecode.sub('%',section) section = url2unicode(section, self._site) if not underscore: section = section.replace('_', ' ')
btw, sourceforge strips all kinds of things out of bugspam, not just german chars :-/
----------------------------------------------------------------------
Comment By: Mikko Silvonen (silvonen) Date: 2008-08-27 23:38
Message: Logged In: YES user_id=127947 Originator: YES
Ouch, the SourceForge email system removes the German sharp s character from the messages. See this issue on the web for the correct links.
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2079760...