jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/816703 )
Change subject: [BUGFIX] Add localized "archive" variables to archivebot.py ......................................................................
[BUGFIX] Add localized "archive" variables to archivebot.py
- Non latin digits support introduced with https://gerrit.wikimedia.org/r/c/pywikibot/core/+/163213 never worked because the variable replacements like %(counter)d expected an int instead of s str. This did not fail as long as textlib.to_local_digits returned an unchanged value if there are no local digits given for a language but it might be failed for those who have it. With 7.5 textlib.to_local_digits always return a str and the archivebot failed. This was fixed recently with 7.5.1. - User should be able to decide whether to use latin or non latin digits. Therefore a lot for new fields were introduced like 'localcounter' which uses the localized number instead of the latin one. This does not break the further implementation due to the %d replacement except in rare cases if the user had it replaced by %s already. - Restore old values for non local fields - Remove the 7.5.1 changes - make a sanity check in analyze_page() method for the case that the local fields are used with %d and show a warning in this case. - Update some related documentatin
Bug: T71551 Bug: T313682 Bug: T313692 Change-Id: I05c165109aa49cfea40339f7fbdaff0150a62928 --- M pywikibot/page/_pages.py M pywikibot/userinterfaces/transliteration.py M scripts/archivebot.py 3 files changed, 71 insertions(+), 46 deletions(-)
Approvals: Matěj Suchánek: Looks good to me, but someone else must approve Xqt: Looks good to me, approved jenkins-bot: Verified
diff --git a/pywikibot/page/_pages.py b/pywikibot/page/_pages.py index 0d4f2be..d130a89 100644 --- a/pywikibot/page/_pages.py +++ b/pywikibot/page/_pages.py @@ -2145,9 +2145,10 @@ @property @cached def raw_extracted_templates(self): - """ - Extract templates using :py:obj:`textlib.extract_templates_and_params`. + """Extract templates and parameters.
+ This method is using + :func:`pywikibot.textlib.extract_templates_and_params`. Disabled parts and whitespace are stripped, except for whitespace in anonymous positional arguments.
@@ -2156,13 +2157,11 @@ return textlib.extract_templates_and_params(self.text, True, True)
def templatesWithParams(self): - """ - Return templates used on this Page. + """Return templates used on this Page.
- The templates are extracted by - :py:obj:`textlib.extract_templates_and_params`, with positional - arguments placed first in order, and each named argument - appearing as 'name=value'. + The templates are extracted by :meth:`raw_extracted_templates`, + with positional arguments placed first in order, and each named + argument appearing as 'name=value'.
All parameter keys and values for each template are stripped of whitespace. diff --git a/pywikibot/userinterfaces/transliteration.py b/pywikibot/userinterfaces/transliteration.py index 6377b64..c0e65ca 100644 --- a/pywikibot/userinterfaces/transliteration.py +++ b/pywikibot/userinterfaces/transliteration.py @@ -4,6 +4,7 @@ # # Distributed under the terms of the MIT license. # +#: Non latin digits used by the framework NON_LATIN_DIGITS = { 'bn': '০১২৩৪৫৬৭৮৯', 'ckb': '٠١٢٣٤٥٦٧٨٩', @@ -19,6 +20,7 @@ 'te': '౦౧౨౩౪౫౬౭౮౯', }
+ _trans = { 'À': 'A', 'Á': 'A', 'Â': 'A', 'Ầ': 'A', 'Ấ': 'A', 'Ẫ': 'A', 'Ẩ': 'A', 'Ậ': 'A', 'Ã': 'A', 'Ā': 'A', 'Ă': 'A', 'Ằ': 'A', 'Ắ': 'A', 'Ẵ': 'A', diff --git a/scripts/archivebot.py b/scripts/archivebot.py index ecc33d6..e92448c 100755 --- a/scripts/archivebot.py +++ b/scripts/archivebot.py @@ -52,18 +52,33 @@ key A secret key that (if valid) allows archives not to be subpages of the page being archived.
-Variables below can be used in the value for "archive" in the template above: +Variables below can be used in the value for "archive" in the template +above; numbers are latin digits:
-%(counter)s the current value of the counter -%(year)s year of the thread being archived -%(isoyear)s ISO year of the thread being archived -%(isoweek)s ISO week number of the thread being archived -%(semester)s semester term of the year of the thread being archived -%(quarter)s quarter of the year of the thread being archived -%(month)s month (as a number 1-12) of the thread being archived +%(counter)d the current value of the counter +%(year)d year of the thread being archived +%(isoyear)d ISO year of the thread being archived +%(isoweek)d ISO week number of the thread being archived +%(semester)d semester term of the year of the thread being archived +%(quarter)d quarter of the year of the thread being archived +%(month)d month (as a number 1-12) of the thread being archived %(monthname)s localized name of the month above %(monthnameshort)s first three letters of the name above -%(week)s week number of the thread being archived +%(week)d week number of the thread being archived + +Alternatively you may use localized digits. This is only available for a +few site languages. Refer :attr:`NON_LATIN_DIGITS +<pywikibot.userinterfaces.transliteration.NON_LATIN_DIGITS>` whether +there is a localized one: + +%(localcounter)s the current value of the counter +%(localyear)s year of the thread being archived +%(localisoyear)s ISO year of the thread being archived +%(localisoweek)s ISO week number of the thread being archived +%(localsemester)s semester term of the year of the thread being archived +%(localquarter)s quarter of the year of the thread being archived +%(localmonth)s month (as a number 1-12) of the thread being archived +%(localweek)s week number of the thread being archived
The ISO calendar starts with the Monday of the week which has at least four days in the new Gregorian calendar. If January 1st is between Monday and @@ -87,9 +102,8 @@ -page:PAGE archive a single PAGE, default ns is a user talk page -salt:SALT specify salt
-.. versionchanged:: 7.5.1 - string presentation type should be used for "archive" variable in the - template to support non latin values +.. versionchanged:: 7.6 + Localized variables for "archive" template parameter are supported """ # # (C) Pywikibot team, 2006-2022 @@ -104,6 +118,7 @@ from collections import OrderedDict, defaultdict from hashlib import md5 from math import ceil +from textwrap import fill from typing import Any, Optional, Pattern from warnings import warn
@@ -484,16 +499,10 @@ return self.get_attr('key') == hexdigest
def load_config(self) -> None: - """Load and validate archiver template. - - .. versionchanged:: 7.5.1 - replace archive pattern fields to string conversion - """ + """Load and validate archiver template.""" pywikibot.info('Looking for: {{{{{}}}}} in {}' .format(self.tpl.title(), self.page))
- fields = self.get_params(self.now, 0).keys() # dummy parameters - pattern = re.compile(r'%(((?:{})))d'.format('|'.join(fields))) for tpl, params in self.page.raw_extracted_templates: try: # Check tpl name before comparing; it might be invalid. tpl_page = pywikibot.Page(self.site, tpl, ns=10) @@ -503,11 +512,7 @@
if tpl_page == self.tpl: for item, value in params.items(): - # convert archive pattern fields to string - # to support non latin digits - if item == 'archive': - value = pattern.sub(r'%\1s', value) - self.set_attr(item.strip(), value.strip()) + self.set_attr(item, value) break else: raise MissingConfigError('Missing or malformed template') @@ -562,20 +567,22 @@ def get_params(self, timestamp, counter: int) -> dict: """Make params for archiving template.""" lang = self.site.lang - return { - 'counter': to_local_digits(counter, lang), - 'year': to_local_digits(timestamp.year, lang), - 'isoyear': to_local_digits(timestamp.isocalendar()[0], lang), - 'isoweek': to_local_digits(timestamp.isocalendar()[1], lang), - 'semester': to_local_digits(int(ceil(timestamp.month / 6)), lang), - 'quarter': to_local_digits(int(ceil(timestamp.month / 3)), lang), - 'month': to_local_digits(timestamp.month, lang), - 'monthname': self.month_num2orig_names[timestamp.month]['long'], - 'monthnameshort': self.month_num2orig_names[ - timestamp.month]['short'], - 'week': to_local_digits( - int(time.strftime('%W', timestamp.timetuple())), lang), + params = { + 'counter': counter, + 'year': timestamp.year, + 'isoyear': timestamp.isocalendar()[0], + 'isoweek': timestamp.isocalendar()[1], + 'semester': int(ceil(timestamp.month / 6)), + 'quarter': int(ceil(timestamp.month / 3)), + 'month': timestamp.month, + 'week': int(time.strftime('%W', timestamp.timetuple())), } + params.update({'local' + key: to_local_digits(value, lang) + for key, value in params.items()}) + monthnames = self.month_num2orig_names[timestamp.month] + params['monthname'] = monthnames['long'] + params['monthnameshort'] = monthnames['short'] + return params
def analyze_page(self) -> Set[ShouldArchive]: """Analyze DiscussionPage.""" @@ -588,6 +595,9 @@ whys = set() pywikibot.output('Processing {} threads' .format(len(self.page.threads))) + fields = self.get_params(self.now, 0).keys() # dummy parameters + regex = re.compile(r'%(((?:{})))d'.format('|'.join(fields))) + stringpattern = regex.sub(r'%\1s', pattern) for i, thread in enumerate(self.page.threads): # TODO: Make an option so that unstamped (unsigned) posts get # archived. @@ -598,7 +608,21 @@ params = self.get_params(thread.timestamp, counter) # this is actually just a dummy key to group the threads by # "era" regardless of the counter and deal with it later - key = pattern % params + try: + key = pattern % params + except TypeError as e: + if 'a real number is required' in str(e): + pywikibot.error(e) + pywikibot.info( + fill('<<lightblue>>Use string format field like ' + '%(localfield)s instead of %(localfield)d. ' + 'Trying to solve it...')) + pywikibot.info() + pattern = stringpattern + key = pattern % params + else: + raise MalformedConfigError(e) + threads_per_archive[key].append((i, thread)) whys.add(why) # xxx: we don't know if we ever archive anything
pywikibot-commits@lists.wikimedia.org