jenkins-bot has submitted this change and it was merged.
Change subject: Start with a fresh list in html2unicode every time ......................................................................
Start with a fresh list in html2unicode every time
def x(..., something=[]): something.extend([1,2,3])
means something becomes [1,2,3] on the first call, but [1,2,3,1,2,3] on the *second* call. This meant html2unicode got a longer list of replacements every time it is called.
This commit changes it to the standard
def x(..., something=None): if something is None: something = []
which means it's always an empty list, instead of what's left from the last call.
Change-Id: Ie490b575a8a0cc4b5d45bbb97c0606e0fd64d4f9 --- M wikipedia.py 1 file changed, 5 insertions(+), 2 deletions(-)
Approvals: Ladsgroup: Looks good to me, approved Malafaya: Checked; Looks good to me, but someone else must approve jenkins-bot: Verified
diff --git a/wikipedia.py b/wikipedia.py index 976a310..f304932 100644 --- a/wikipedia.py +++ b/wikipedia.py @@ -5657,13 +5657,16 @@
# Utility functions for parsing page titles
-def html2unicode(text, ignore = []): +def html2unicode(text, ignore = None): """Return text, replacing HTML entities by equivalent unicode characters.""" + + if ignore is None: + ignore = [] # This regular expression will match any decimal and hexadecimal entity and # also entities that might be named entities. entityR = re.compile( r'&(?:amp;)?(#(?P<decimal>\d+)|#x(?P<hex>[0-9a-fA-F]+)|(?P<name>[A-Za-z]+));') - + ignore.extend((38, # Ampersand (&) 39, # Bugzilla 24093 60, # Less than (<)