http://www.mediawiki.org/wiki/Special:Code/pywikipedia/9793
Revision: 9793 Author: xqt Date: 2011-12-09 18:25:00 +0000 (Fri, 09 Dec 2011) Log Message: ----------- Some iw links are encoded with html entity. Decode &-entity first. See http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion%3AXqt&acti...
Modified Paths: -------------- trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2011-12-09 14:31:19 UTC (rev 9792) +++ trunk/pywikipedia/wikipedia.py 2011-12-09 18:25:00 UTC (rev 9793) @@ -4643,7 +4643,7 @@ # This regular expression will match any decimal and hexadecimal entity and # also entities that might be named entities. entityR = re.compile( - r'&(#(?P<decimal>\d+)|#x(?P<hex>[0-9a-fA-F]+)|(?P<name>[A-Za-z]+));') + r'&(?:amp;)?(#(?P<decimal>\d+)|#x(?P<hex>[0-9a-fA-F]+)|(?P<name>[A-Za-z]+));') # These characters are Html-illegal, but sadly you *can* find some of # these and converting them to unichr(decimal) is unsuitable convertIllegalHtmlEntities = {
pywikipedia-svn@lists.wikimedia.org