jenkins-bot has submitted this change and it was merged.
Change subject: archivebot.py: make timestripper more robust ......................................................................
archivebot.py: make timestripper more robust
Timestripper is made more robust stripping from text parts that are not supposed to contain timestamps. This should reduce false positive matches.
Point 1) of observed archivebot problems on cswiki in bug 72047.
Bug: 72047
Change-Id: I715e074dedb7677bce18d20e2004c9614546e6b9 --- M pywikibot/textlib.py 1 file changed, 7 insertions(+), 0 deletions(-)
Approvals: John Vandenberg: Looks good to me, approved jenkins-bot: Verified
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py index 4c74cef..4306799 100644 --- a/pywikibot/textlib.py +++ b/pywikibot/textlib.py @@ -1252,6 +1252,8 @@ self.pdayR, ]
+ self.linkP = compileLinkR() + def findmarker(self, text, base=u'@@', delta='@'): """Find a string which is not part of text.""" while base in text: @@ -1303,6 +1305,11 @@ """ # match date fields dateDict = dict() + # Remove parts that are not supposed to contain the timestamp, in order + # to reduce false positives. + line = removeDisabledParts(line) + line = self.linkP.sub('', line) # remove external links + line = self.fix_digits(line) for pat in self.patterns: line, matchDict = self.last_match_and_replace(line, pat)
pywikibot-commits@lists.wikimedia.org