jenkins-bot has submitted this change and it was merged.
Change subject: archivebot.py: make timestripper more robust
......................................................................
archivebot.py: make timestripper more robust
Timestripper is made more robust stripping from text parts that
are not supposed to contain timestamps.
This should reduce false positive matches.
Point 1) of observed archivebot problems on cswiki in bug 72047.
Bug: 72047
Change-Id: I715e074dedb7677bce18d20e2004c9614546e6b9
---
M pywikibot/textlib.py
1 file changed, 7 insertions(+), 0 deletions(-)
Approvals:
John Vandenberg: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 4c74cef..4306799 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -1252,6 +1252,8 @@
self.pdayR,
]
+ self.linkP = compileLinkR()
+
def findmarker(self, text, base=u'@@', delta='@'):
"""Find a string which is not part of text."""
while base in text:
@@ -1303,6 +1305,11 @@
"""
# match date fields
dateDict = dict()
+ # Remove parts that are not supposed to contain the timestamp, in order
+ # to reduce false positives.
+ line = removeDisabledParts(line)
+ line = self.linkP.sub('', line) # remove external links
+
line = self.fix_digits(line)
for pat in self.patterns:
line, matchDict = self.last_match_and_replace(line, pat)
--
To view, visit
https://gerrit.wikimedia.org/r/167406
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I715e074dedb7677bce18d20e2004c9614546e6b9
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: jenkins-bot <>