pywikibot September 2017

pywikibot@lists.wikimedia.org

4 participants
6 discussions

[Pywikipedia-l] Urlencoded section titles
by Bináris 13 Sep '18

13 Sep '18

Happy Monday, There are strange people who make such links (kindof urlencoded?): [[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban .28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]] So the section title must have been copied from the URL. Do we have a ready tool to fix these? -- Bináris

3 11

Significant change: Snak hashes in API and HTML output formats
by Lucas Werkmeister 11 Sep '17

11 Sep '17

Hi all! This is an announcement for a significant change to the Wikibase entity format, which went live the beginning of September. It potentially affects clients that process snaks <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Glossary#Snak>. Internally, Wikibase assigns a *hash* to each snak (which is just the hash function (Q183427) <https://www.wikidata.org/wiki/Q183427> of an internal representation of the snak). Those hashes were previously emitted for snaks that appeared in qualifiers, but not for the main snak or reference snaks of a statement. With the change, the hashes are emitted for all snaks, regardless of where they appear. This means that a snak can now look like this: { "snaktype": "value", "property": "P370", "hash": "682fdb448ef68669a1b728a5076836da9ac3ffae", "datavalue": { "value": "some text", "type": "string" }, "datatype": "string"} The hashes are also added to the HTML output, as an additional class similar to the statement ID class on statements: <div class="wikibase-statementview wikibase-statement-Q4115189$29acf9c6-450a-7612-d206-049f5fe58328">  <div class="wikibase-statementview-mainsnak"> <div class="wikibase-snakview wikibase-snakview-682fdb448ef68669a1b728a5076836da9ac3ffae">  </div> </div></div> The ultimate goal of this is to make any snak addressable in the DOM, which is necessary for checking constraints on qualifiers and references (T168532 <https://phabricator.wikimedia.org/T168532>). It should be noted that unlike statement IDs, snak hashes are not identifiers. They are not stable, and may change at any time with the internal format. Please let us know if you have any comments or objections. -- Lucas Relevant tickets: - T171607 <https://phabricator.wikimedia.org/T171607> - T171725 <https://phabricator.wikimedia.org/T171725> Relevant patches: - https://github.com/wmde/WikibaseDataModelSerialization/pull/233 - https://gerrit.wikimedia.org/r/#/c/374835/ -- Lucas Werkmeister Software Developer (Intern) Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

RFC: Counting and saving replacements in replace.py
by Bináris 04 Sep '17

04 Sep '17

I have corrected the misspelled subject and added one more good reason to do it. Please reply to this one, not the prevoius. Happy Monday for all! I have had a dream for many-many years already: when I use replace.py for correcting grammatical and spelling errors (that's what I do in most of my time dedicated to botwork), it would be nice and useful to count the replacements and to extract old-new pairs for later use. This task needs changing of replaceExcept() in textlib, so for long time I haven't been brave enough as I thought it to be more complicated and I was afraid of community rejection, as often happens when a task is important for somebody, but others don't feel it as such. But now compat is desert, so I was brave enough to make experiments with my copy. I still use replace.py in compat for several reasons, not to be detailed here. I want to show you what I have done and why and ask for opinions, if this is a good direction and could be ported to core. The benefit of this change is much greater, then the pain with it. Of course, the below solution is an estimation. It cannot count modifications with the built-in editor. Still it is a good and useful estimation. == Example == Please have a look at https://hu.wikipedia.org/wiki/ Szerkeszt%C5%91:BinBot/munka#2017._szeptember_4. The first numerical value in the table is the number of modified pages, the last one is the total number of replacements. The difference is astonishing: for some tasks the two numbers are equal, while there is one where the last number is 12 times as big as the first. I think this is something worth to show. == Motivation == === Counting the replacements === * Statistics * Choosing bot tasks by efficiency * Printing the number of replacements to the screen after each page increases the security of the work. Sometimes not every diff is properly coloured (e.g. think of a space), and the work is tiring, so it is easy to skip a change, but the number may make the user focus on it. * Give data for community (e.g. which are dangerous common errors, where we need further steps) * Natural curiosity of a bot owner * Scientific purpose * etc. === Saving the old-new pairs to a file or a wikipage == * Preparing new bot tasks, developing fixes and regexes * Creating lists of common errors for the community * There is a common spelling error which is quite easy to detect when the word is [[link]]ed, but almost impossible without linking (due to the enormous number of false positives). Ma idea is to save the hits from the linked version and use them for the unliked as a list of errors rather than a pattern. * There is another common error which is not worth to be treated by bot due to the enormous number of false positives. But if I could save the list automatically (without modifying pages), it could be revised by volunteers and used later as a list of errors rather than a pattern. * Showing this list to users or groups of interests in order to teach them which errors to avoid in the future. * Scientific purpose * etc. == Solution == Sorry, I cannot create a diff now, because this directory is not versioned. However, these 4 steps are not complicated to follow. === textlib.py === def replaceExcept(text, old, new, exceptions, caseInsensitive=False, allowoverlap=False, marker='', site=None): became: def replaceExcept(text, old, new, exceptions, caseInsensitive=False, allowoverlap=False, marker='', site=None, returnPairs=False): Just within 80 characters. :-) So it won't cause any harm when called from anywhere without the new argument, the behaviour is unchanged for existing calls. A new initialization: pairs = [] At the end of the main if, bottom of this branch: else: # We found a valid match. Replace it. the last line: markerpos = match.start() + len(replacement) became: markerpos = match.start() + len(replacement) pairs.append((match.group(), replacement)) And at the very end of the method instead of return text now I have: if returnPairs: return (text, pairs) else: return text === replace.py === replaceExcept() is called from doReplacements(). Without details, instead of returning new_text, now it will return (new_text, replaceList) where replaceList is a list of (old, new) tuples. Generally it is not recommended to mix returning values and making side effects, such as storing pairs in a list, which is global to the method, so I decided do give back pairs. The main method of the bot (run()) can handle it according to given parameters, either to increment a counter, or save the (old, new) pairs to a file or a wikipage, or do nothing, just the classic task of replacement. It needs some memory, but by this point only pairs of the actual page are stored. Unless you explicitely create a huge list with all the occuring pairs, which is not neccessary, it won't cause a problem. -- Bináris -- Bináris

1 0

RFC: Counting ans saving replacements in replace.py
by Bináris 04 Sep '17

04 Sep '17

Happy Monday for all! I have had a dream for many-many years already: when I use replace.py for correcting grammatical and spelling errors (that's what I do in most of my time dedicated to botwork), it would be nice and useful to count the replacements and to extract old-new pairs for later use. This task needs changing of replaceExcept() in textlib, so for long time I haven't been brave enough as I thought it to be more complicated and I was afraid of community rejection, as often happens when a task is important for somebody, but others don't feel it as such. But now compat is desert, so I was brave enough to make experiments with my copy. I still use replace.py in compat for several reasons, not to be detailed here. I want to show you what I have done and why and ask for opinions, if this is a good direction and could be ported to core. The benefit of this change is much greater, then the pain with it. Of course, the below solution is an estimation. It cannot count modifications with the built-in editor. Still it is a good and useful estimation. == Example == Please have a look at https://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/munka#2017._szeptember… . The first numerical value in the table is the number of modified pages, the last one is the total number of replacements. The difference is astonishing: for some tasks the two numbers are equal, while there is one where the last number is 12 times as big as the first. I think this is something worth to show. == Motivation == === Counting the replacements === * Statistics * Choosing bot tasks by efficiency * Give data for community (e.g. which are dangerous common errors, where we need further steps) * Natural curiosity of a bot owner * Scientific purpose * etc. === Saving the old-new pairs to a file or a wikipage == * Preparing new bot tasks, developing fixes and regexes * Creating lists of common errors for the community * There is a common spelling error which is quite easy to detect when the word is [[link]]ed, but almost impossible without linking (due to the enormous number of false positives). Ma idea is to save the hits from the linked version and use them for the unliked as a list of errors rather than a pattern. * There is another common error which is not worth to be treated by bot due to the enormous number of false positives. But if I could save the list automatically (without modifying pages), it could be revised by volunteers and used later as a list of errors rather than a pattern. * Showing this list to users or groups of interests in order to teach them which errors to avoid in the future. * Scientific purpose * etc. == Solution == Sorry, I cannot create a diff now, because this directory is not versioned. However, these 4 steps are not complicated to follow. === textlib.py === def replaceExcept(text, old, new, exceptions, caseInsensitive=False, allowoverlap=False, marker='', site=None): became: def replaceExcept(text, old, new, exceptions, caseInsensitive=False, allowoverlap=False, marker='', site=None, returnPairs=False): Just within 80 characters. :-) So it won't cause any harm when called from anywhere without the new argument, the behaviour is unchanged for existing calls. A new initialization: pairs = [] At the end of the main if, bottom of this branch: else: # We found a valid match. Replace it. the last line: markerpos = match.start() + len(replacement) became: markerpos = match.start() + len(replacement) pairs.append((match.group(), replacement)) And at the very end of the method instead of return text now I have: if returnPairs: return (text, pairs) else: return text === replace.py === replaceExcept() is called from doReplacements(). Without details, instead of returning new_text, now it will return (new_text, replaceList) where replaceList is a list of (old, new) tuples. Generally it is not recommended to mix returning values and making side effects, such as storing pairs in a list, which is global to the method, so I decided do give back pairs. The main method of the bot (run()) can handle it according to given parameters, either to increment a counter, or save the (old, new) pairs to a file or a wikipage, or do nothing, just the classic task of replacement. It needs some memory, but by this point only pairs of the actual page are stored. Unless you explicitely create a huge list with all the occuring pairs, which is not neccessary, it won't cause a problem. -- Bináris

1 0

Problem with bot password
by Eddie Monroe 03 Sep '17

03 Sep '17

MediaWiki 1.29.0 Apache 2.4.27 MySQL 5.6.37 PHP 5.6.31 phpMyAdmin 4.7.2 Python 3.6.2 Pywikibot (whatever the latest from ToolForge is) I recently installed Bitnami mediawiki wamp stack, python and pywikibot with tbe above versions. It is a local installation running on MS Windows 10. My mediawiki seems to work fine. I can login, create and edit pages. I added a separate bot user but when I go to try to set a password on the Special page for Bot passwords I get the following error: Invalid IP address or range: 0.0.0.0/0 ::/0 I have tried every combination of ip addresses I can imagine (inluding the default) 0.0.0.0/0 ::/0 but nothing is accepted. I am new to wikis, python and that sort of stuff. Any suggestions would be greatly appreciated.

2 1

Pywikibot-test OAuth and Travis settings
by Dalba 02 Sep '17

02 Sep '17

Hi everyone! Nearly two weeks ago some of our travis build jobs related to testing OAuth have started to timeout.[1] I think it is related to travis misconfigured enviroment variables. Perhaps someohne has changed them? Unfortunately I don't have access there and I don't know who has. If you're the one, please take a look at T173498[1], we probably need to check/reset OAuth token values. Therefore other than having access to travis settings, knowing the authentication information for Pywikibot-test account is required.[2] Thanks! [1]: https://phabricator.wikimedia.org/T173498 [2]: https://phabricator.wikimedia.org/T100797#1321387

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot September 2017