jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789552 )
Change subject: [IMPR] Add a -nopreload option to replace.py
......................................................................
[IMPR] Add a -nopreload option to replace.py
Wikis might have preloading pages disabled. To reflect this a new
-nopreload option was added to disable preloading pages within generator
See:
https://lotro-wiki.com/index.php/User:Magill-bot#Usage_Explanation
Change-Id: Icfd0442bebb1b39ff8b72533deadbc85e172fa76
---
M scripts/replace.py
1 file changed, 6 insertions(+), 1 deletion(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/replace.py b/scripts/replace.py
index 6d807db..0ba7a01 100755
--- a/scripts/replace.py
+++ b/scripts/replace.py
@@ -93,6 +93,8 @@
-quiet Don't prompt a message if a page keeps unchanged
+-nopreload Do not preload pages. Usefull if disabled on a wiki.
+
-recursive Recurse replacement as long as possible. Be careful, this
might lead to an infinite loop.
@@ -892,6 +894,7 @@
edit_summary = ''
# Array which will collect commandline parameters.
# First element is original text, second element is replacement text.
+ preload = False # preload pages
commandline_replacements = []
file_replacements = []
# A list of 2-tuples of original text and replacement text.
@@ -952,6 +955,8 @@
manual_input = True
elif opt == '-pairsfile':
file_replacements = handle_pairsfile(value)
+ elif opt == '-nopreload':
+ preload = False
else:
commandline_replacements.append(arg)
@@ -1087,7 +1092,7 @@
# exceptions are taken into account by the ReplaceRobot
gen = handle_sql(sql_query, replacements, exceptions['text-contains'])
- gen = genFactory.getCombinedGenerator(gen, preload=True)
+ gen = genFactory.getCombinedGenerator(gen, preload=preload)
if pywikibot.bot.suggest_help(missing_generator=not gen):
return
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789552
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: Icfd0442bebb1b39ff8b72533deadbc85e172fa76
Gerrit-Change-Number: 789552
Gerrit-PatchSet: 2
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki(a)aol.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
Xqt has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789897 )
Change subject: [IMPR] Print counter statistic for all counters
......................................................................
[IMPR] Print counter statistic for all counters
BaseBot provides a counter which holds 'read', 'write' and 'skip'
by default but other counters can be added easily.
- rewrite exit() method that all counter values are printed
- increase 'read' counter before processing the page to add the 'read'
counter first. BotPage.counter keeps the order of insertion (with
Python 3.6+; the order of Python 3.5 is not deterministic)
- add tests accordingly
- add counters in touch.py when a page is touched or purged
- use 'upload' counter in specialbots/_upload.py
- Update BaseBot documentation
Bug: T307834
Change-Id: I567bae073e49eb3bde083b82a30e9f2a76044950
---
M docs/api_ref/pywikibot.rst
M pywikibot/bot.py
M pywikibot/specialbots/_upload.py
M scripts/touch.py
M tests/bot_tests.py
5 files changed, 80 insertions(+), 59 deletions(-)
Approvals:
Xqt: Verified; Looks good to me, approved
diff --git a/docs/api_ref/pywikibot.rst b/docs/api_ref/pywikibot.rst
index 02bd83a..cd8f47c 100644
--- a/docs/api_ref/pywikibot.rst
+++ b/docs/api_ref/pywikibot.rst
@@ -29,6 +29,7 @@
--------------------
.. automodule:: pywikibot.bot
+ :member-order: bysource
pywikibot.bot\_choice module
----------------------------
diff --git a/pywikibot/bot.py b/pywikibot/bot.py
index 274e4bf..e343643 100644
--- a/pywikibot/bot.py
+++ b/pywikibot/bot.py
@@ -1174,16 +1174,14 @@
'Option opt.bar is 4711'
"""
- # Handler configuration.
- # Only the keys of the dict can be passed as init options
- # The values are the default values
- # Overwrite this in subclasses!
-
available_options = {} # type: Dict[str, Any]
+ """ Handler configuration attribute.
+ Only the keys of the dict can be passed as `__init__` options.
+ The values are the default values. Overwrite this in subclasses!
+ """
def __init__(self, **kwargs: Any) -> None:
- """
- Only accept options defined in available_options.
+ """Only accept options defined in available_options.
:param kwargs: bot options
"""
@@ -1206,8 +1204,10 @@
class BaseBot(OptionHandler):
- """
- Generic Bot to be subclassed.
+ """Generic Bot to be subclassed.
+
+ Only accepts `generator` and options defined in
+ :attr:`available_options`.
This class provides a :meth:`run` method for basic processing of a
generator one page at a time.
@@ -1225,13 +1225,12 @@
properties.
If the subclass does not set a generator, or does not override
- :meth:`treat` or :meth:`run`, NotImplementedError is raised.
+ :meth:`treat` or :meth:`run`, `NotImplementedError` is raised.
For bot options handling refer :class:`OptionHandler` class above.
.. versionchanged:: 7.0
- A counter attribute is provided which is a `collections.Counter`;
- The default counters are 'read', 'write' and 'skip'.
+ A :attr:`counter` instance variable is provided.
"""
use_disambigs = None # type: Optional[bool]
@@ -1250,18 +1249,14 @@
.. versionadded:: 7.2
"""
- # Handler configuration.
- # The values are the default values
- # Extend this in subclasses!
-
available_options = {
'always': False, # By default ask for confirmation when putting a page
}
update_options = {} # type: Dict[str, Any]
- """update_options can be used to update available_options;
+ """`update_options` can be used to update :attr:`available_options`;
do not use it if the bot class is to be derived but use
- self.available_options.update(<dict>) initializer in such case.
+ `self.available_options.update(<dict>)` initializer in such case.
.. versionadded:: 6.4
"""
@@ -1269,7 +1264,7 @@
_current_page = None # type: Optional[pywikibot.page.BasePage]
def __init__(self, **kwargs: Any) -> None:
- """Only accept 'generator' and options defined in available_options.
+ """Initializer.
:param kwargs: bot options
:keyword generator: a :attr:`generator` processed by :meth:`run` method
@@ -1279,14 +1274,26 @@
pywikibot.warn('{} has a generator already. Ignoring argument.'
.format(self.__class__.__name__))
else:
- #: generator processed by :meth:`run` method
+ #: instance variable to hold the generator processed by
+ #: :meth:`run` method
self.generator = kwargs.pop('generator')
self.available_options.update(self.update_options)
super().__init__(**kwargs)
- self.counter = Counter()
+ self.counter = Counter() # type: Counter
+ """Instance variable which holds counters. The default counters
+ are 'read', 'write' and 'skip'. You can use your own counters like::
+
+ self.counter['delete'] += 1
+
+ .. versionadded:: 7.0
+ .. versionchanged:: 7.3
+ Your additional counters are also printed during :meth:`exit`
+ """
+
self._generator_completed = False
+
#: instance variable to hold the default page type
self.treat_page_type = pywikibot.page.BasePage # type: Any
@@ -1378,21 +1385,18 @@
"""
Save a new revision of a page, with user confirmation as required.
- Print differences, ask user for confirmation,
- and puts the page if needed.
+ Print differences, ask user for confirmation, and puts the page
+ if needed.
Option used:
* 'always'
- Keyword args used:
-
- * 'asynchronous' - passed to page.save
- * 'summary' - passed to page.save
- * 'show_diff' - show changes between oldtext and newtext (enabled)
- * 'ignore_save_related_errors' - report and ignore (disabled)
- * 'ignore_server_errors' - report and ignore (disabled)
-
+ :keyword asynchronous: passed to page.save
+ :keyword summary: passed to page.save
+ :keyword show_diff: show changes between oldtext and newtext (enabled)
+ :keyword ignore_save_related_errors: report and ignore (disabled)
+ :keyword ignore_server_errors: report and ignore (disabled)
:return: whether the page was saved successfully
"""
if oldtext.rstrip() == newtext.rstrip():
@@ -1403,7 +1407,6 @@
self.current_page = page
show_diff = kwargs.pop('show_diff', True)
-
if show_diff:
pywikibot.showDiff(oldtext, newtext)
@@ -1419,6 +1422,8 @@
"""
Helper function to handle page save-related option error handling.
+ .. note:: Do no use it directly. Use :meth:`userPut` instead.
+
:param page: currently edited page
:param func: the function to call
:param args: passed to the function
@@ -1430,6 +1435,8 @@
page save will be reported and ignored (default: False)
:kwtype ignore_save_related_errors: bool
:return: whether the page was saved successfully
+
+ :meta public:
"""
if not self.user_confirm('Do you want to accept these changes?'):
return False
@@ -1473,13 +1480,17 @@
raise QuitKeyboardInterrupt
def exit(self) -> None:
- """
- Cleanup and exit processing.
+ """Cleanup and exit processing.
- Invoked when Bot.run() is finished.
- Prints treat and save counters and informs whether the script
+ Invoked when :meth:`run` is finished. Waits for pending threads,
+ prints counter statistics and informs whether the script
terminated gracefully or was halted by exception.
- May be overridden by subclasses.
+
+ .. note:: Do not overwrite it by subclasses; :meth:`teardown`
+ should be used instead.
+
+ .. versionchanged:: 7.3
+ Statistics are printed for all entries in :attr:`counter`
"""
self.teardown()
if hasattr(self, '_start_ts'):
@@ -1489,10 +1500,10 @@
# wait until pending threads finished but don't close the queue
pywikibot.stopme()
- pywikibot.output('\n{read} pages read'
- '\n{write} pages written'
- '\n{skip} pages skipped'
- .format_map(self.counter))
+ pywikibot.info()
+ for op, count in self.counter.items():
+ pywikibot.info('{} {} operation{}'
+ .format(count, op, 's' if count > 1 else ''))
if hasattr(self, '_start_ts'):
write_delta = pywikibot.Timestamp.now() - self._start_ts
@@ -1508,10 +1519,12 @@
if self.counter['read']:
pywikibot.output('Read operation time: {:.1f} seconds'
.format(read_seconds / self.counter['read']))
- if self.counter['write']:
- pywikibot.output(
- 'Write operation time: {:.1f} seconds'
- .format(write_seconds / self.counter['write']))
+
+ for op, count in self.counter.items():
+ if not count or op == 'read':
+ continue
+ pywikibot.info('{} operation time: {:.1f} seconds'
+ .format(op.capitalize(), write_seconds / count))
# exc_info contains exception from self.run() while terminating
exc_info = sys.exc_info()
@@ -1525,13 +1538,15 @@
def init_page(self, item: Any) -> 'pywikibot.page.BasePage':
"""Initialize a generator item before treating.
- Ensure that the result of init_page is always a pywikibot.Page object
- even when the generator returns something else.
+ Ensure that the result of `init_page` is always a
+ pywikibot.Page object or any other type given by the
+ :attr:`treat_page_type` even when the generator returns
+ something else.
- Also used to set the arrange the current site. This is called before
- skip_page and treat.
+ Also used to set the arrange the current site. This is called
+ before :meth:`skip_page` and :meth:`treat`.
- :param item: any item from self.generator
+ :param item: any item from :attr:`generator`
:return: return the page object to be processed further
"""
return item
@@ -1576,17 +1591,18 @@
.format(self.__class__.__name__))
def setup(self) -> None:
- """Some initial setup before run operation starts.
+ """Some initial setup before :meth:`run` operation starts.
This can be used for reading huge parts from life wiki or file
operation which is more than just initialize the instance.
- Invoked by run() before running through generator loop.
+ Invoked by :meth:`run` before running through :attr:`generator`
+ loop.
.. versionadded:: 3.0
"""
def teardown(self) -> None:
- """Some cleanups after run operation. Invoked by exit().
+ """Some cleanups after :meth:`run` operation. Invoked by :meth:`exit`.
.. versionadded:: 3.0
"""
@@ -1621,8 +1637,8 @@
continue
# Process the page
- self.treat(page)
self.counter['read'] += 1
+ self.treat(page)
self._generator_completed = True
except QuitKeyboardInterrupt:
diff --git a/pywikibot/specialbots/_upload.py b/pywikibot/specialbots/_upload.py
index 52409eb..743f38e 100644
--- a/pywikibot/specialbots/_upload.py
+++ b/pywikibot/specialbots/_upload.py
@@ -431,7 +431,7 @@
# No warning, upload complete.
pywikibot.output('Upload of {} successful.'
.format(filename))
- self.counter['write'] += 1
+ self.counter['upload'] += 1
return filename # data['filename']
pywikibot.output('Upload aborted.')
break
diff --git a/scripts/touch.py b/scripts/touch.py
index 292f8ad..48386dd 100755
--- a/scripts/touch.py
+++ b/scripts/touch.py
@@ -61,6 +61,8 @@
.format(page.title(as_link=True)))
except PageSaveRelatedError as e:
pywikibot.error('Page {} not saved:\n{}'.format(page, e.args))
+ else:
+ self.counter['touch'] += 1
class PurgeBot(MultipleSitesBot):
@@ -76,9 +78,11 @@
def treat(self, page) -> None:
"""Purge the given page."""
+ done = page.purge(**self.opt)
+ if done:
+ self.counter['purge'] += 1
pywikibot.output('Page {}{} purged'
- .format(page,
- '' if page.purge(**self.opt) else ' not'))
+ .format(page, '' if done else ' not'))
def main(*args: str) -> None:
diff --git a/tests/bot_tests.py b/tests/bot_tests.py
index 12c89de..8b51092 100755
--- a/tests/bot_tests.py
+++ b/tests/bot_tests.py
@@ -195,7 +195,7 @@
pywikibot.Page(self.en, 'Page 2'),
pywikibot.Page(self.de, 'Page 3')],
post_treat)
- self.bot.exit = self._exit(2, exception=ValueError)
+ self.bot.exit = self._exit(3, exception=ValueError)
with self.assertRaisesRegex(ValueError, 'Whatever'):
self.bot.run()
@@ -212,7 +212,7 @@
pywikibot.Page(self.de, 'Page 3')],
post_treat)
- self.bot.exit = self._exit(2, exception=None)
+ self.bot.exit = self._exit(3, exception=None)
self.bot.run()
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789897
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I567bae073e49eb3bde083b82a30e9f2a76044950
Gerrit-Change-Number: 789897
Gerrit-PatchSet: 6
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki(a)aol.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/788789 )
Change subject: [IMPR]: Use proofreadpagesinindex query module
......................................................................
[IMPR]: Use proofreadpagesinindex query module
Use proofreadpagesinindex query module to get pages in IndexPage.
See:
- https://www.mediawiki.org/wiki/Extension:ProofreadPage/Index_pagination_API
Now self._all_page_links is a dict that contains all pages included in
IndexPage. When pages are fetched via self.page_gen() no new Page objects are
created; they are retrieved from self._all_page_links instead.
This also makes IndexPage more robust when getting links in Page ns, see
bug T307280.
Change-Id: I1d36dbde0ff12078c45c3e80c69912bbe4436039
---
M pywikibot/proofreadpage.py
1 file changed, 40 insertions(+), 24 deletions(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/proofreadpage.py b/pywikibot/proofreadpage.py
index 9b0650b..758ed41 100644
--- a/pywikibot/proofreadpage.py
+++ b/pywikibot/proofreadpage.py
@@ -47,7 +47,7 @@
Tuple,
)
from pywikibot.comms import http
-from pywikibot.data.api import Request
+from pywikibot.data.api import ListGenerator, Request
from pywikibot.exceptions import Error, OtherPageSaveError
from pywikibot.page import PageSourceType
from pywikibot.tools import cached
@@ -824,14 +824,30 @@
raise ValueError('Page {} must belong to {} namespace'
.format(self.title(), site.proofread_index_ns))
- self._all_page_links = set(
- self.site.pagelinks(self, namespaces=site.proofread_page_ns))
- # bug T307280
- self._all_page_links |= set(
- self.site.pagetemplates(self, namespaces=site.proofread_page_ns))
+ self._all_page_links = {}
+
+ for page in self._get_prp_index_pagelist():
+ self._all_page_links[page.title()] = page
self._cached = False
+ def _get_prp_index_pagelist(self):
+ """Get all pages in an IndexPage page list."""
+ site = self.site
+ ppi_args = {}
+ if hasattr(self, '_pageid'):
+ ppi_args['prppiipageid'] = str(self._pageid)
+ else:
+ ppi_args['prppiititle'] = self.title().encode(site.encoding())
+
+ ppi_gen = site._generator(ListGenerator, 'proofreadpagesinindex',
+ **ppi_args)
+ for item in ppi_gen:
+ page = ProofreadPage(site, item['title'])
+ page.page_offset = item['pageoffset']
+ page.index = self
+ yield page
+
@staticmethod
def _parse_redlink(href: str) -> Optional[str]:
"""Parse page title when link in Index is a redlink."""
@@ -839,7 +855,7 @@
r'/w/index\.php\?title=(.+?)&action=edit&redlink=1')
title = p_href.search(href)
if title:
- return title.group(1)
+ return title.group(1).replace('_', ' ')
return None
def save(self, *args: Any, **kwargs: Any) -> None: # See Page.save().
@@ -907,23 +923,27 @@
self._soup = _bs4_soup(self.get_parsed_page(True)) # type: ignore
# Do not search for "new" here, to avoid to skip purging if links
# to non-existing pages are present.
- attrs = {'class': re.compile('prp-pagequality')}
+ attrs = {'class': re.compile('prp-pagequality-[0-4]')}
# Search for attribute "prp-pagequality" in tags:
# Existing pages:
# <a href="/wiki/Page:xxx.djvu/n"
+ # class="prp-pagequality-0 quality0" or
+ # class="prp-index-pagelist-page prp-pagequality-0 quality0"
# title="Page:xxx.djvu/n">m
- # class="quality1 prp-pagequality-1"
# </a>
# Non-existing pages:
# <a href="/w/index.php?title=xxx&action=edit&redlink=1"
- # class="new"
+ # class="new prp-index-pagelist-page"
# title="Page:xxx.djvu/n (page does not exist)">m
# </a>
# Try to purge or raise ValueError.
found = self._soup.find_all('a', attrs=attrs)
- attrs = {'class': re.compile('prp-pagequality|new')}
+ attrs = {'class': re.compile('prp-pagequality-[0-4]|'
+ 'new prp-index-pagelist-page|'
+ 'prp-index-pagelist-page')
+ }
if not found:
self.purge()
self._soup = _bs4_soup(self.get_parsed_page(True)) # type: ignore
@@ -932,7 +952,6 @@
'Missing class="qualityN prp-pagequality-N" or '
'class="new" in: {}.'.format(self))
- # Search for attribute "prp-pagequality" or "new" in tags:
page_cnt = 0
for a_tag in self._soup.find_all('a', attrs=attrs):
label = a_tag.text.lstrip('0') # Label is not converted to int.
@@ -947,16 +966,12 @@
title = a_tag.get('title') # existing page
assert title is not None
- try:
- page = ProofreadPage(self.site, title)
- page.index = self # set index property for page
- page_cnt += 1
- except ValueError:
- # title is not in site.proofread_page_ns; do not consider it
- continue
- if page not in self._all_page_links:
- raise Error('Page {} not recognised.'.format(page))
+ try:
+ page = self._all_page_links[title]
+ page_cnt += 1
+ except KeyError:
+ continue
# In order to avoid to fetch other Page:title links outside
# the Pages section of the Index page; these should hopefully be
@@ -982,7 +997,8 @@
self._pages_from_label.setdefault(label, set()).add(page)
# Sanity check: all links to Page: ns must have been considered.
- assert set(self._labels_from_page) == set(self._all_page_links)
+ assert (set(self._labels_from_page)
+ == set(self._all_page_links.values()))
# Info cached.
self._cached = True
@@ -1036,8 +1052,8 @@
# Decorate and sort by page number because preloadpages does not
# guarantee order.
# TODO: remove if preloadpages will guarantee order.
- gen = ((p, self.get_number(p)) for p in gen)
- gen = (p[0] for p in sorted(gen, key=lambda x: x[1]))
+ gen = ((self.get_number(p), p) for p in gen)
+ gen = (p for n, p in sorted(gen))
return gen
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/788789
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I1d36dbde0ff12078c45c3e80c69912bbe4436039
Gerrit-Change-Number: 788789
Gerrit-PatchSet: 5
Gerrit-Owner: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/790720 )
Change subject: [IMPR] Prioritize -namespaces options in pg.handle_args
......................................................................
[IMPR] Prioritize -namespaces options in pg.handle_args
Prioritize -namespaces options to solve problems with several
generators like -newpages/-random/-randomredirect/-linter.
pagegenerators.handle_arg should be deprecated and replaced by
pagegenerators.handle_args to solve these issues completely.
Bug: T222519
Change-Id: Ifdb222f4725d2e1c74f9e97373cc5dfdf2416670
---
M pywikibot/pagegenerators.py
1 file changed, 10 insertions(+), 3 deletions(-)
Approvals:
Matěj Suchánek: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/pagegenerators.py b/pywikibot/pagegenerators.py
index 796cecf..002ca17 100644
--- a/pywikibot/pagegenerators.py
+++ b/pywikibot/pagegenerators.py
@@ -365,9 +365,9 @@
-ns:not:2,3
-ns:not:Help,File
- If used with -newpages/-random/-randomredirect/linter
+ If used with -newpages/-random/-randomredirect/-linter
generators, -namespace/ns must be provided before
- -newpages/-random/-randomredirect/linter.
+ -newpages/-random/-randomredirect/-linter.
If used with -recentchanges generator, efficiency is
improved if -namespace is provided before -recentchanges.
@@ -1267,8 +1267,15 @@
"""Handle command line arguments and return the rest as a list.
.. versionadded:: 6.0
+ .. versionchanged:: 7.3
+ Prioritize -namespaces options to solve problems with several
+ generators like -newpages/-random/-randomredirect/-linter
"""
- return [arg for arg in args if not self.handle_arg(arg)]
+ ordered_args = [arg for arg in args
+ if arg.startswith(('-ns', '-namespace'))]
+ ordered_args += [arg for arg in args
+ if not arg.startswith(('-ns', '-namespace'))]
+ return [arg for arg in ordered_args if not self.handle_arg(arg)]
def handle_arg(self, arg: str) -> bool:
"""Parse one argument at a time.
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/790720
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: Ifdb222f4725d2e1c74f9e97373cc5dfdf2416670
Gerrit-Change-Number: 790720
Gerrit-PatchSet: 4
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: Dvorapa <dvorapa(a)seznam.cz>
Gerrit-Reviewer: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789984 )
Change subject: [bugfix] Call ExistingPageBot.skip_page() first
......................................................................
[bugfix] Call ExistingPageBot.skip_page() first
If a ExistingPageBot is subclassed and also has its own skip_page,
call `super().skip_page()` first to ensure that non existent pages
are filtered before other calls are made
- update several scripts
- add a warning in ExistingPageBot.skip_page method
Bug: T86491
Change-Id: Ic39c00c8bc3165fb6c7f89c1b8838bb15542e7b3
---
M pywikibot/bot.py
M scripts/blockpageschecker.py
M scripts/newitem.py
M scripts/noreferences.py
M scripts/reflinks.py
M scripts/replace.py
6 files changed, 27 insertions(+), 6 deletions(-)
Approvals:
Matěj Suchánek: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/bot.py b/pywikibot/bot.py
index 68eb169..c6e238c 100644
--- a/pywikibot/bot.py
+++ b/pywikibot/bot.py
@@ -1959,7 +1959,12 @@
"""A CurrentPageBot class which only treats existing pages."""
def skip_page(self, page: 'pywikibot.page.BasePage') -> bool:
- """Treat page if it exists and handle NoPageError."""
+ """Treat page if it exists and handle NoPageError.
+
+ .. warning:: If subclassed, call `super().skip_page()` first to
+ ensure that non existent pages are filtered before other
+ calls are made
+ """
if not page.exists():
pywikibot.warning('Page {page} does not exist on {page.site}.'
.format(page=page))
diff --git a/scripts/blockpageschecker.py b/scripts/blockpageschecker.py
index 79f56a3..cd677e8 100755
--- a/scripts/blockpageschecker.py
+++ b/scripts/blockpageschecker.py
@@ -233,6 +233,9 @@
# "{} is sysop-protected : this account can't edit "
# "it! Skipping...".format(pagename))
# continue
+ if super().skip_page(page):
+ return True
+
page.protection()
if not page.has_permission():
pywikibot.warning(
@@ -240,7 +243,7 @@
.format(page))
return True
- return super().skip_page(page)
+ return False
def remove_templates(self):
"""Understand if the page is blocked has the right template."""
diff --git a/scripts/newitem.py b/scripts/newitem.py
index 693a79b..e6469d3 100755
--- a/scripts/newitem.py
+++ b/scripts/newitem.py
@@ -139,6 +139,9 @@
def skip_page(self, page) -> bool:
"""Skip pages which are unwanted to treat."""
+ if super().skip_page(page):
+ return True
+
if page.editTime() > self.lastEditBefore:
pywikibot.output(
'Last edit on {page} was on {page.latest_revision.timestamp}.'
@@ -170,7 +173,7 @@
% (page, template))
return True
- return super().skip_page(page)
+ return False
def treat_page_and_item(self, page, item) -> None:
"""Treat page/item."""
diff --git a/scripts/noreferences.py b/scripts/noreferences.py
index 35aff3d..59eb118 100755
--- a/scripts/noreferences.py
+++ b/scripts/noreferences.py
@@ -709,13 +709,16 @@
def skip_page(self, page):
"""Check whether the page could be processed."""
+ if super().skip_page(page):
+ return True
+
if self.site.sitename == 'wikipedia:en' and page.isIpEdit():
pywikibot.warning(
'Page {} is edited by IP. Possible vandalized'
.format(page.title(as_link=True)))
return True
- return super().skip_page(page)
+ return False
def treat_page(self) -> None:
"""Run the bot."""
diff --git a/scripts/reflinks.py b/scripts/reflinks.py
index 23ae56b..aa3de1a 100755
--- a/scripts/reflinks.py
+++ b/scripts/reflinks.py
@@ -541,10 +541,14 @@
def skip_page(self, page):
"""Skip unwanted pages."""
+ if super().skip_page(page):
+ return True
+
if not page.has_permission():
pywikibot.warning("You can't edit page {page}" .format(page=page))
return True
- return super().skip_page(page)
+
+ return False
def treat(self, page) -> None:
"""Process one page."""
diff --git a/scripts/replace.py b/scripts/replace.py
index 6d807db..e5a460e 100755
--- a/scripts/replace.py
+++ b/scripts/replace.py
@@ -648,6 +648,9 @@
def skip_page(self, page):
"""Check whether treat should be skipped for the page."""
+ if super().skip_page(page):
+ return True
+
if self.isTitleExcepted(page.title()):
pywikibot.warning(
'Skipping {} because the title is on the exceptions list.'
@@ -658,7 +661,7 @@
pywikibot.warning("You can't edit page {}".format(page))
return True
- return super().skip_page(page)
+ return False
def treat(self, page) -> None:
"""Work on each page retrieved from generator."""
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789984
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: Ic39c00c8bc3165fb6c7f89c1b8838bb15542e7b3
Gerrit-Change-Number: 789984
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki(a)aol.com>
Gerrit-Reviewer: JAn Dudík <jan.dudik(a)gmail.com>
Gerrit-Reviewer: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: Multichill <maarten(a)mdammers.nl>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
Xqt has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789980 )
Change subject: [doc] Update ROADMAP.rst and CHANGELOG.md
......................................................................
[doc] Update ROADMAP.rst and CHANGELOG.md
Change-Id: I89eec0b1685bc0351df5cf23de589d0572b1c1f2
---
M ROADMAP.rst
M scripts/CHANGELOG.md
2 files changed, 23 insertions(+), 1 deletion(-)
Approvals:
jenkins-bot: Verified
Xqt: Looks good to me, approved
diff --git a/ROADMAP.rst b/ROADMAP.rst
index 422b6af..eeace9a 100644
--- a/ROADMAP.rst
+++ b/ROADMAP.rst
@@ -1,7 +1,11 @@
Current release 7.3.0
^^^^^^^^^^^^^^^^^^^^^
-* Preserve more workers than families are handled for preload_sites.py
+* Remove `ThreadList.stop_all()` method (:phab:`T307830`)
+* L10N updates
+* Improve get_charset_from_content_type function (:phab:`T307760`)
+* A tiny cache wrapper was added to hold results of parameterless methods and properties
+* Increase workers in preload_sites.py
* Close logging handlers before deleting them (:phab:`T91375`, :phab:`T286127`)
* Clear _sites cache if called with pwb wrapper (:phab:`T225594`)
* Enable short creation of a site if family name is equal to site code
diff --git a/scripts/CHANGELOG.md b/scripts/CHANGELOG.md
index 55eaff8..7c209ab 100644
--- a/scripts/CHANGELOG.md
+++ b/scripts/CHANGELOG.md
@@ -3,6 +3,17 @@
## 7.3.0
*In development*
+### weblinkchecker
+* Do not kill threads after generator is exhausted (:phab:`T113139`)
+* Use Page.extlinks() to get external links (:phab:`T60812`)
+
+
+## 7.2.1
+*07 May 2022**
+
+### movepages
+* Fix regression of option parsing (:phab:`T307826`)
+
## 7.2.0
*26 April 2022*
@@ -35,6 +46,13 @@
* A -quiet option was added to omit message when no change was made
+## 7.1.1
+*15 April 2022*
+
+### replace
+* Fix regression of XmlDumpPageGenerator
+
+
## 7.1.0
*26 March 2022*
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/789980
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I89eec0b1685bc0351df5cf23de589d0572b1c1f2
Gerrit-Change-Number: 789980
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki(a)aol.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged