jenkins-bot has submitted this change. (
https://gerrit.wikimedia.org/r/c/pywikibot/core/+/748371 )
Change subject: [bugfix] Change regex to detect meta information
......................................................................
[bugfix] Change regex to detect meta information
- enable meta with charset but without content-type
- enable quotes with charset information
Bug: T298006
Change-Id: Ie1c56848d5485ee91a32c6c0bc75264c018ba05b
---
M scripts/reflinks.py
1 file changed, 4 insertions(+), 2 deletions(-)
Approvals:
Rubin: Looks good to me, but someone else must approve
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/reflinks.py b/scripts/reflinks.py
index c37296e..b3e425e 100755
--- a/scripts/reflinks.py
+++ b/scripts/reflinks.py
@@ -481,9 +481,11 @@
.format(self.stop_page.title(as_link=True)))
# Regex to grasp content-type meta HTML tag in HTML source
- self.META_CONTENT =
re.compile(br'(?i)<meta[^>]*content\-type[^>]*>')
+ self.META_CONTENT = re.compile(
+ br'(?i)<meta[^>]*(?:content\-type|charset)[^>]*>')
# Extract the encoding from a charset property (from content-type !)
- self.CHARSET =
re.compile(r'(?i)charset\s*=\s*(?P<enc>[^\'",;>/]*)')
+ self.CHARSET = re.compile(
+
r'(?i)charset\s*=\s*(?P<enc>(?P<q>[\'"]?)[^\'",;>/]*(?P=q))')
# Extract html title from page
self.TITLE =
re.compile(r'(?is)(?<=<title>).*?(?=</title>)')
# Matches content inside <script>/<style>/HTML comments
--
To view, visit
https://gerrit.wikimedia.org/r/c/pywikibot/core/+/748371
To unsubscribe, or for help writing mail filters, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: Ie1c56848d5485ee91a32c6c0bc75264c018ba05b
Gerrit-Change-Number: 748371
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki(a)aol.com>
Gerrit-Reviewer: Rubin <rubin.happy(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged