jenkins-bot submitted this change.

View Change

Approvals: Rubin: Looks good to me, but someone else must approve Xqt: Looks good to me, approved jenkins-bot: Verified
[bugfix] Change regex to detect meta information

- enable meta with charset but without content-type
- enable quotes with charset information

Bug: T298006
Change-Id: Ie1c56848d5485ee91a32c6c0bc75264c018ba05b
---
M scripts/reflinks.py
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/reflinks.py b/scripts/reflinks.py
index c37296e..b3e425e 100755
--- a/scripts/reflinks.py
+++ b/scripts/reflinks.py
@@ -481,9 +481,11 @@
.format(self.stop_page.title(as_link=True)))

# Regex to grasp content-type meta HTML tag in HTML source
- self.META_CONTENT = re.compile(br'(?i)<meta[^>]*content\-type[^>]*>')
+ self.META_CONTENT = re.compile(
+ br'(?i)<meta[^>]*(?:content\-type|charset)[^>]*>')
# Extract the encoding from a charset property (from content-type !)
- self.CHARSET = re.compile(r'(?i)charset\s*=\s*(?P<enc>[^\'",;>/]*)')
+ self.CHARSET = re.compile(
+ r'(?i)charset\s*=\s*(?P<enc>(?P<q>[\'"]?)[^\'",;>/]*(?P=q))')
# Extract html title from page
self.TITLE = re.compile(r'(?is)(?<=<title>).*?(?=</title>)')
# Matches content inside <script>/<style>/HTML comments

To view, visit change 748371. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: Ie1c56848d5485ee91a32c6c0bc75264c018ba05b
Gerrit-Change-Number: 748371
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info@gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki@aol.com>
Gerrit-Reviewer: Rubin <rubin.happy@gmail.com>
Gerrit-Reviewer: Xqt <info@gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged