jenkins-bot has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/323184 )
Change subject: Remove non-breaking spaces when tidying up a link
......................................................................
Remove non-breaking spaces when tidying up a link
The relevant code comes from:
https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/t…
Bug: T130818
Change-Id: I45d843824eae4fa68ab4001b68dd7bf05c2e6439
---
M pywikibot/page.py
M tests/link_tests.py
2 files changed, 6 insertions(+), 4 deletions(-)
Approvals:
Merlijn van Deen: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/page.py b/pywikibot/page.py
index 78c2f79..49f282e 100644
--- a/pywikibot/page.py
+++ b/pywikibot/page.py
@@ -5292,10 +5292,10 @@
raise pywikibot.Error(
"Title contains illegal char (\\uFFFD 'REPLACEMENT
CHARACTER')")
- # Replace underscores by spaces
- t = t.replace(u"_", u" ")
- # replace multiple spaces with a single space
- t = re.sub(' {2,}', ' ', t)
+ # Cleanup whitespace
+ t = re.sub(
+ '[_ \xa0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]+',
+ ' ', t)
# Strip spaces at both ends
t = t.strip()
# Remove left-to-right and right-to-left markers.
diff --git a/tests/link_tests.py b/tests/link_tests.py
index 9df58b0..8bacdc1 100644
--- a/tests/link_tests.py
+++ b/tests/link_tests.py
@@ -90,6 +90,8 @@
self.assertEqual(Link('A é B', self.get_site()).title, u'A
é B')
self.assertEqual(Link('A é B', self.get_site()).title, u'A é
B')
self.assertEqual(Link('A é B', self.get_site()).title, u'A
é B')
+ self.assertEqual(Link('A B', self.get_site()).title, 'A
B')
+ self.assertEqual(Link('A   B', self.get_site()).title, 'A
B')
l = Link('A | B', self.get_site())
self.assertEqual(l.title, 'A')
--
To view, visit
https://gerrit.wikimedia.org/r/323184
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I45d843824eae4fa68ab4001b68dd7bf05c2e6439
Gerrit-PatchSet: 7
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: Dalba <dalba.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Magul <tomasz.magulski(a)gmail.com>
Gerrit-Reviewer: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XXN <dan15i(a)yahoo.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot <>