Given a piece of text, I want to exclude <!-- text parts --> when searching
a template. Is there a ready-to-use, nice, canonical method for it?
--
Bináris
Hi,
In archivebot.py there is a line:
# TODO: handle marked with template
This means resolved and unresolved sections. Unresolved sections are not to
be archived even if they are old enough, resolved could be archived faster.
I coded unresolved for Hungarian Wikipedia (1), and I will do the resolved,
too. But this way we use a separate branch. If we want to do it in general,
we should put the templates somewhere. Templates depend on project, not
language.
There may also be a set of templates to trigger the same behaviour, see
e.g. (1) for variety of templates.
So we want a set of "resolved" and another set of "unresolved" templates
per project.
What is the nice architecture? Where would you put them? Of course, the
easiest way is to hardcode in the script, but I think we don't do such
things nowadays.
[1]
https://hu.wikipedia.org/w/index.php?title=Szerkeszt%C5%91:Atobot/archivebo…
[2] https://en.wikipedia.org/wiki/Template:Unresolved
--
Bináris
Sorry for the previous, but this one seems to be real.
I followed
https://www.mediawiki.org/wiki/Manual:Pywikibot/Installation/SVN#Download_P…
c:\Pywikibot>python pwb.py version
WARNING: Http response status 404
Pywikibot: pywikibot/__init__.py (, -1 (unknown), 2023/01/30, 13:59:21, n/a)
Release version: *3.1.dev0*
requests version: 2.25.1
cacerts: C:\Python37\lib\site-packages\certifi\cacert.pem
certificate test: ok
Python: 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916
64 bit (AMD64)]
--
Bináris
Is there a way to recurse through a category, but excluding specific sub-cats? For example, I want to find all the templates in [[Category:People and person infobox templates]], except that I don't want to recurse into [[Category:Styles infobox templates]].
I see my original message was held up in moderation due to the large attachment. I've uploaded the image to File:Screenshot of viztracer output.png <https://commons.wikimedia.org/wiki/File:Screenshot_of_viztracer_output.png>
> On Jan 13, 2023, at 6:38 PM, Roy Smith <roy(a)panix.com> wrote:
>
>
> I just discovered viztracer. https://viztracer.readthedocs.io/en/stable/index.html <https://viztracer.readthedocs.io/en/stable/index.html>
>
> I've been trying to figure out why my pywikibot app is so slow. It took me about 1 minute to instrument my code (see diff), I hit my URL in a browser, and loaded up the resulting json file into visviewer. Now I'm scrolling around and drilling down into a total execution trace of my code, although it really only took a moment to see that most of the time is in 11 serialized API calls. This is easily the coolest performance analysis tool I've ever used.
>
> <Screen Shot 2023-01-13 at 6.32.31 PM.png>
>
> diff --git a/dyk_web/core.py b/dyk_web/core.py
> index 31758dc..1e58203 100644
> --- a/dyk_web/core.py
> +++ b/dyk_web/core.py
> @@ -28,11 +28,16 @@ def get_pending_nominations():
> return titles
>
>
> +from viztracer import VizTracer
> +from pathlib import Path
> +
> +
> @bp.route("/display")
> def display():
> """template_name query arg is the DYK nomination template, including the Template: prefix."""
> - current_app.logger.info <http://current_app.logger.info/>("Running on %s", os.uname().nodename)
> - page = Page(g.site <http://g.site/>, request.args["template_name"])
> - nomination = Nomination(page)
> - nomination_data = NominationData.from_nomination(nomination)
> - return render_template("display.html", nomination=nomination_data)
> + with VizTracer(output_file=str(Path.home() / "viztracer.json")):
> + current_app.logger.info <http://current_app.logger.info/>("Running on %s", os.uname().nodename)
> + page = Page(g.site <http://g.site/>, request.args["template_name"])
> + nomination = Nomination(page)
> + nomination_data = NominationData.from_nomination(nomination)
> + return render_template("display.html", nomination=nomination_data)
>
Traceback (most recent call last):
File "/data/data/com.termux/files/home/vikaspy/pwb.py", line 399, in
<module>
if not main():
File "/data/data/com.termux/files/home/vikaspy/pwb.py", line 391, in main
run_python_file(filename,
File "/data/data/com.termux/files/home/vikaspy/pwb.py", line 106, in
run_python_file
exec(compile(source, filename, 'exec', dont_inherit=True),
File "./scripts/replace.py", line 1075, in <module>
main()
File "./scripts/replace.py", line 929, in main
single_summary = i18n.twtranslate(
File
"/data/data/com.termux/files/home/vikaspy/pywikibot/tools/_deprecate.py",
line 404, in wrapper
return obj(*__args, **__kw)
File "/data/data/com.termux/files/home/vikaspy/pywikibot/i18n.py", line
700, in twtranslate
raise pywikibot.exceptions.TranslationError(
pywikibot.exceptions.TranslationError: Unable to load messages package
scripts.i18n for bundle replace-replacing
It can happen due to lack of i18n submodule or files. See
https://www.mediawiki.org/wiki/Manual:Pywikibot/i18n
CRITICAL: Exiting due to uncaught exception <class
'pywikibot.exceptions.TranslationError'>
$
Pywikibot, even if you're just using it as a library, configures its own complicated logging structure:
o "pywiki"
| Level Level 11
| Propagate OFF
| Handler <TerminalHandler <stderr> (INFO)>
| Level INFO
| Filter <pywikibot.userinterfaces.terminal_interface_base.MaxLevelFilter object at 0x7f7f66dafe50>
| Formatter fmt='%(message)s%(newline)s' datefmt=None
| Handler <TerminalHandler <stdout> (STDOUT)>
| Level STDOUT
| Filter <pywikibot.userinterfaces.terminal_interface_base.MaxLevelFilter object at 0x7f7f66daffa0>
| Formatter fmt='%(message)s%(newline)s' datefmt=None
| Handler <TerminalHandler <stderr> (WARNING)>
| Level WARNING
| Formatter fmt='%(levelname)s: %(message)s%(newline)s' datefmt=None
| |
| o<--[pywiki.wiki]
| |
| o<--"pywiki.wiki.family"
| Level NOTSET so inherits level Level 11
|
Is there any way to make it not do this? I want to have full control of the logging config in my application. In particular, I want all the logging to go to my logfile. Having a library install its own handlers which are hard-wired to a TerminalHandler just complicates that.
I found these in my dykbot-cron.err file. What causes this? Is it something to worry about?
WARNING: API error mwoauth-invalid-authorization: The authorization headers in your request are not valid: Nonce already used: <hex string elided>
ERROR: Retrying failed OAuth authentication for wikipedia:en: The authorization headers in your request are not valid: Nonce already used: <hex string elided>