Pywikipedia-l June 2014

pywikipedia-l@lists.wikimedia.org

27 participants
42 discussions

Start a nNew thread

pep257 docstring style checker
by Ricordisamoa 05 Nov '14

05 Nov '14

You must see this! https://github.com/GreenSteam/pep257 Just run it on our codebase, it gives thousands of errors...

5 6

reflinks.py under GPL?
by Ricordisamoa 01 Aug '14

01 Aug '14

I found this in the source code of scripts/reflinks.py: Distributed under the terms of the GPL This seems to be the single case in the whole repository. Is it compatible with our license conventions? It even doesn't have a full GNU-style license header.

8 10

Pywikibot and Wikimania
by Amir Ladsgroup 23 Jul '14

23 Jul '14

Hello all, The submission <https://wikimania2014.wikimedia.org/wiki/Submissions/Bots_and_Pywikibot> about pywikibot got accepted and it'll be given in August 10, 11:30-12:00 And I think that would be awesome if we can work on bugs or developing new features during the wikimania hackathon. Off-topic: Just to make you smile: Things Programmers Say Vs What They Mean <http://www.tickld.com/cdn_image_article/a_538_20140602111023.png> Best -- Amir

4 5

Resolve(?) templates (get final wikitext) from a string with pywikibot
by Frank Wein 29 Jun '14

29 Jun '14

Hi all, I want to do the following: I want to extract all templates from a Wikipedia page with pywikibot.extract_templates_and_params(pagetext), that works fine. Now for certain fields in certain templates I want to parse the parameter itself. Or better said: I want to resolve (is this the correct term?) any included templates in such parameters, so basically I want to get the wikitext after parsing any templates within such parameters. As an example if my description is a bit too vague: I have this template {{Infobox number number = 4 following-number = {{add_one_and_link_it|4}} }} The wikitext returned by the "add_one_and_link_it" template would be "[[5]]". Now, can I do this with pywikibot, too? Pass some string (in my case: extracted from a template) to a function and the bot will pass this to Wikipedia to get the final wikitext (I want to parse that wikitext)? Frank

1 1

Several questions (mainly about guideline)
by Sorawee Porncharoenwase 27 Jun '14

27 Jun '14

1) What is the preferable shebang? What I have seen are: - #!/usr/bin/python (such as scripts/add_text.py) - #!/usr/bin/env python (such as scripts/archivebot.py) - no shebang (such as scripts/catall.py) 2) Why some scripts (such as blockpageschecker.py) are executable , while the others (such as archivebot.py) are not? 3) Why don't we always use async=True with all scripts? I admit that I don't know whether async has a negative impact, so I asked it here. Anyway, I think that we can add async=True to touch.py, for example, without causing any problem. 4) Which one is preferable between "summary" and "reason"? I think this topic was previously discussed before, but I can't remember the final resolution. /^ *def [^)]*?(reason|summary)/ page.py - reason def move(self, newtitle, reason=None, movetalkpage=True, sysop=False, def delete(self, reason=None, prompt=True, mark=False): def protect(self, edit='sysop', move='sysop', create=None, upload=None, unprotect=False, reason=None, prompt=True, expiry=None): def block(self, expiry, reason, anononly=True, nocreate=True, - summary def removeImage(self, image, put=False, summary=None, safe=True): def replaceImage(self, image, replacement=None, put=False, summary=None, site.py - reason def blockuser(self, user, expiry, reason, anononly=True, nocreate=True, def unblockuser(self, user, reason): - summary def editpage(self, page, summary, minor=True, notminor=False, def movepage(self, page, newtitle, summary, movetalk=True, def deletepage(self, page, summary): def protect(self, page, protections, summary, expiry=None): scripts in scripts directory - summary def add_text(page=None, addText=None, summary=None, regexSkip=None, def update(self, summary, sort_threads=False): def __init__(self, generator, always, summary=None): def __init__(self, generator, summary, always=False, undelete=True): def __init__(self, generator, oldImage, newImage=None, summary='', def __init__(self, generator, addprefix, noredirect, movetalkpage, always, skipredirects, summary): def __init__(self, reader, force, append, summary, minor, autosummary, def __init__(self, generator, summary, always=False, unprotect=False, def __init__(self, generator, acceptall=False, limit=None, ignorepdf=False, summary=None): def __init__(self, generator, replacements, exceptions={}, acceptall=False, allowoverlap=False, recursive=False, addedCat=None, sleep=None, summary='', site=None): Sorawee Porncharoenwase

4 3

How to: Categories and articles with same name
by Jan Dudík 26 Jun '14

26 Jun '14

Hello, I have list of names, which exists both in article and category namespace Foo | Category:Foo Bar | Category:Bar I want to link them together: To every category I want to add {{Catmore}}, so I use: add_text.py -file:skwiki.txt -up -text:"{{Catmore|{{subst:PAGENAME}}}}" -except:"\{\{[Cc]atmore(.*?)" -lang:sk And I want add to every article "[[Category:{{subst:PAGENAME}}| ]]", ideally as first category. But I didn't found any suitable script for this. I can add it without checking existence as last category, but this will lead to duplicate categories in article: add_text.py -file:skwiki1.txt -text:"[[Category:{{subst:PAGENAME}}| ]]" -lang:sk Have you any idea how to make it? JAnD

2 1

Compat: pywikibot.output() forces everything to original stdout and stderr
by Jonathan Goble 24 Jun '14

24 Jun '14

I've given up trying to solve a bug that popped up in my scripts a couple days ago. I run a bot for Wookieepedia, over at Wikia, and run three simple scripts on a daily basis. They are set up to run automatically through Windows Task Scheduler. Since they run automatically, they run in the background through pythonw.exe, i.e. without a console, and therefore I need a means of getting the output. My solution for the past two months has been to redirect sys.stdout and sys.stderr to the same StringIO() instance, then at the end call getvalue() on that and email it to myself. This worked perfectly until a couple days ago. Suddenly, I stopped receiving anything sent through pywikibot.output() or its cousins, although I continued to receive my own output that was produced by print statements. After some experimenting in the interactive interpreter, I determined that somehow pywikibot.ui (the interface instance) is not storing the correct stdout and stderr, but I don't know what's causing this. Nothing in my scripts changed around the time this started happening, and I had not updated pywikibot or python itself in quite a while. I did update pywikibot to the newest nightly version, but the bug persists. I'm asking here since this is directly connected to pywikibot. Any idea what could be going on? (By the way, the answer is NOT "switch to core". I have tried to get core to run on my system and failed miserably after two hours of repeated attempts without even getting it to talk to the wiki. Compat worked perfectly on the first try. Until such time as core can be installed by a beginner, it is not for me.) Jonathan Goble

3 6

title parts on the command line
by John Mark Vandenberg 23 Jun '14

23 Jun '14

Many scripts accept page titles spanning multiple command line arguments, usually put into an array called titleParts and joined together. It is redundant to pagegenerators argument -page:"..." , and a poor equivalent as only one page can be specified with titleParts. Also not using quotes on the command line allows the interpreter to mangle the command line arguments before they are given to the script. We have one changeset proposing to remove that functionality in core. https://gerrit.wikimedia.org/r/#/c/137354/ And I vaguely recall that a similar change by Ricordi Samoa to another script has already been merged. I agree with Ricordi that the titleParts pattern isnt a very good one, and 'should' be removed, but .. do users find it convenient? Is it mostly for Windows? If it is desirable, we could build this functionality into pagegenerators, and able to be enabled/disabled in the config. -- John Vandenberg

1 0

i18n configuration
by Merlijn van Deen 17 Jun '14

17 Jun '14

The following might be a bit unclear, as it's a bit of a brain dump. It's mainly meant as a response to https://gerrit.wikimedia.org/r/#/c/137904/2/tests/l10n_tests.py and https://gerrit.wikimedia.org/r/#/c/137924/ and as 'food for thought'. Basically, the question is how we can let the i18n not depend on the hardcoded 'scripts.i18n' import - this is problematic for tests, is problematic for pywikibot-installed-as-a-package (because there is no scripts.i18n then) and is problematic for third party authors (because they *have* to use the scripts.i18n folder to store their translations). I have some thoughts on this, and maybe we can make something cool out of it. Essentially, we would want a script to be able to indicate /where/ it's i18n file is located. There's a few ways to do this, but I guess the cleanest option is something like this: - pywikibot.i18n gets an 'I18N' class which contains the current twtranslate functions, - this I18N class takes a parameter: the filename of the i18n translation file (which, at some point, could also be a JSON file) - maybe more filenames, if more translation files need to be loaded? - or maybe a directory that contains translation files? - we add a simple wrapper that would allow the current scripts to do something like import pywikibot.i18n i18n = pywikibot.i18n.forScript(__file__) where 'forScript' does some path parsing to change __file__ (= the filename of the current file) from /path/to/original/file to /path/to/original/i18n/file which is the setup we are currently using. I'm not sure about backwards compatibility, but I guess we could have a pre-prepared pywikibot.i18n.twtranslate doing what it does now, via the I18N class (listing all files, maybe?) Please let me know if this sounds like a good idea to implement. Merlijn

1 0

Page delete and protect
by John Mark Vandenberg 17 Jun '14

17 Jun '14

In https://gerrit.wikimedia.org/r/#/c/139792 I noticed that Page.delete() and page.protect() have a lot of user interaction logic which would normally be in a script. e.g. asking a user what actions to take. Also, they set a flag in the site object. i.e. site._noDeletePrompt = True and site._noProtectPrompt = True Is there any reason for it being in Page? -- John Vandenberg

5 6

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l June 2014