Pywikipedia-l November 2010

pywikipedia-l@lists.wikimedia.org

10 participants
12 discussions

Replace.py should save -- please comment

by Bináris

Hi! My old problem is that repalce.py can't write the pages to work on into a file on my disk. I have used a modificated version for years that does no changes but writes the title of the involved pages to a subpage on Wikipedia in automated mode, and then I can make the replacements from that page much more quickly than directly from dump or living Wikipedia. This is slow and generates a plenty of dummy edits. In other words, replace.py has a tool to get the titles from a file (-file) or from a wikipage (-links), but has no tool to generate this file. Now I am ready to rewrite it. This way we can start it and the bot will find all the possible articles to work on and save the titles without editing Wikipedia (and without artificial delay), meanwhile we can have the lunch or run a marathon or sleep. Then we make the replacements from this with -file. My idea is that replace.py should have two new parameters: -save writes the results into a new file instead of editing articles. It overwrites existing file without notice. -saveappend writes into a file or appends to the existing one. OR: -save writes and appends (primary mode) -savenew writes and overwrites The help is here: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data So we have to import codecs. My script is: articles=codecs.open('cikkek.txt','a',encoding='utf-8') ... tutuzuzu=u'# %s\n' %page.aslink() <-- needs rewrite to the new syntax articles.write(unicode(tutuzuzu)) <-- needs further testing, if nicode() is really needed articles.flush() It works fine except '\n' is a unix-styled newline that has to be converted by lfcr.py in order to make it readable with notepad.exe. This is with constant filename, that should be developed to get from command line. Your opinions before I begin? -- Bináris

12 years, 2 months

How to read special pages?

by Bináris

I want to read a special page with Page.get(). The message is: File "C:\Program Files\Pywikipedia\wikipedia.py", line 601, in get raise NoPage('%s is in the Special namespace!' % self.aslink()) pywikibot.exceptions.NoPage What is the solution? -- Bináris

12 years, 2 months

Re: [Pywikipedia-l] [Toolserver-l] Default Python will change to 2.7

by Merlijn van Deen

Hello all, This is especially relevant for all interwiki bots on the toolserver. Do *not* use python 2.7 for those bots. There is a bug [1] in the unicode normalization that causes page titles to become mangled [2]. This, in turn, results in botwars [3]. As such, interwiki bots on wikipedia should use a python version that does not have this bug, which means using a version before 2.6.5. Although you will get a warning message when using a python version that exhibits this bug, the bot will still work. As such, you may very well cause bot wars if you start using py2.7. Best regards, Merlijn van Deen [1] http://bugs.python.org/issue10254 [2] http://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_i… [3] http://de.wikipedia.org/w/index.php?title=GNU-Lizenz_f%C3%BCr_freie_Dokumen… On 22 November 2010 11:22, River Tarnell <river.tarnell(a)wikimedia.de> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > During the general maintenance on Dec 6th, we will change the default Python > version (/usr/bin/python) on the Solaris user servers from 2.6 to 2.7. You may > wish to test your tools with /usr/bin/python2.7 before then. > > - river. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.16 (FreeBSD) > > iEYEARECAAYFAkzqREgACgkQIXd7fCuc5vIhFQCgpX20z0B9xHikuwl+yiEUDzFH > WjYAn1wqm21wZjP1uQhsEO7RkxlTyE/N > =CqUE > -----END PGP SIGNATURE----- > > _______________________________________________ > Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/toolserver-l > Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

13 years, 3 months

Re: [Pywikipedia-l] [Pywikipedia-svn] SVN: [8630] trunk/pywikipedia

by Nicolas Dumazet

Hello guys. 2010/10/10 <xqt(a)svn.wikimedia.org>: > Revision: 8630 > Author: xqt > Date: 2010-10-09 19:32:57 +0000 (Sat, 09 Oct 2010) > > Log Message: > ----------- > import wikipedia as pywikibot for merging to rewrite > [...] > > Modified: trunk/pywikipedia/reflinks.py > =================================================================== > --- trunk/pywikipedia/reflinks.py 2010-10-09 16:11:46 UTC (rev 8629) > +++ trunk/pywikipedia/reflinks.py 2010-10-09 19:32:57 UTC (rev 8630) > @@ -33,15 +33,19 @@ > Basic pagegenerators commands, -page, etc... > """ > # (C) 2008 - Nicolas Dumazet ( en:User:NicDumZ ) > +# (C) Pywikipedia bot team, 2008-2010 > # > -# Distributed under the terms of the GPL > - > +# Distributed under the terms of the MIT license. A few things are wrong in this commit 1) The changes do not match the commit message. A license change is not related. 2) You cannot change the license of a script without asking to ALL contributors of the file for their permissions. No one asked me if it was OK for me to switch from a license to another. Note that I am personally fine with changing license if it's required, but doing so without asking the original authors can seriously harm the project.... Regards, -- Nicolas Dumazet — NicDumZ

13 years, 5 months

Page.py - User.contributions() crashing on deleted edit comments

by Morten Wang

Hi, Got a KeyError in page.py, line 2067, lines 2066-67 go: yield Page(self.site, contrib['title'], contrib['ns']), \ contrib['revid'], ts, contrib['comment'] In the case of a deleted edit comment, it appears that the dictionary for each edit returned by site.usercontribs() doesn't contain said key ('comment'), resulting in a KeyError. I've currently patched my local copy to handle the problem, don't know if it's appropriate to handle this in User.contributions(), Site.usercontribs(), or somewhere else. Cheers, Morten

13 years, 5 months

skip to next occurrence in a page?

by Chris Watkins

I'm using replace.py to create wikilinks. Usually I want to select only the first occurrence of the search string, and my command works fine for this. But sometimes, the first hit is not suitable (e.g. it's part of a book or course title, so I don't want to add the wikilink). If I choose n for no, the bot goes to the next page. Is there a way I can skip to the next occurrence in the same page? I'm guessing it will need a modified version of replace.py, so that it gives an extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit) The actual command I'm using is: python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$) " "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref -excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102 -namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square brackets to: FOO1|FOO2." -log -xml:currentdump.xml Many thanks! -- Chris Watkins Appropedia.org - Sharing knowledge to build rich, sustainable lives. blogs.appropedia.org identi.ca/appropedia twitter.com/appropedia

13 years, 5 months

Minor comment changes

by Dr. Trigon

Hello all Just looked at the last svn chnages and thought this patch could be good to apply, in fact it just chnages some comments, but I think it's needed, may be you too. ;) I did not open a ticket since I do no think it is THAT relevant, beacuse it's not a bug. Greetings

13 years, 5 months

userlib.py: fix for KeyError bug #3097185

by Dr. Trigon

Hello xqt and valhallasw! Just wanted to give a final comment to this topic. ;) ---------------------------------------------------------------------- > >Comment By: xqt (xqt) Date: 2010-11-06 04:17 Message: KeyError fixed in r8701 as recommended Maybe there are follow-ups of this bug in other scripts depending of the comment is None ---------------------------------------------------------------------- THANKS A LOT FOR IMPLEMENTING THIS! ---------------------------------------------------------------------- Comment By: Merlijn S. van Deen (valhallasw) Date: 2010-11-04 14:44 Message: I disagree with the statement "an unicode string is expected and thus a None (..) is not a good idea". The comment is hidden, which is different from an empty comment. Using None is much more sensible. In addition, the code can be simplified by using contrib.get('comment', None) instead of the current if/the. ---------------------------------------------------------------------- YEA, I WAS NOT SURE WHICH ONE TO CHOOSE. 'None' SEAMS MORE SENSIBLE SOLUTION, BUT SINCE I WANTED NOT TO BREAK OTHER CODE... ;) INDEED, THE SIMPLIFIED CODE IS A LOT BETTER - I WAS FOCUSSED ON TRIGGERING TO 'commenthidden' WHICH IS NOT REALLY NEEDED. AFTER ALL WHEN I WAS LOOKING AT THE CODE THE 'None' GETS FED INTO A PAGE OBJECTS, SO THIS IS THE ONLY PART THAT COULD BREAK, AND SHOULD NOT AFFECT ANY OTHER CODE... SEAMS THE PERFECT SOLUTION. Greetings and thanks Dr. Trigon

13 years, 5 months

Yet another idea for replace.py -- please comment

by Bináris

Hi, I have an enhacement that I wrote for myself. Now, with replace.py r8700, using -xml and -save, we can collect the articles to work on in automatic mode. But sometimes it takes a lot of time, and we would like to know, whether it will end in the near future or needs more time. For example, I have to leave home, and I want to know if the task will end in 15 minutes or I have to quit it. The xml dump contains articles mainly in the order of their creation, so knowing the date of the first edit if the article on the screen is useful. It may cause a very few decreasing of speed. My replace.py writes this date on the screen after every 20th title if in automatic mode (but perhaps it could also be each title). The question is: is it worth to build into the framework for public use, or is it only in my interest? Shall I put it on SF or forget? -- Bináris

13 years, 5 months

match and list, but not replace

by Chris Watkins

I want to generate a list of matches for a search, but not do anything to the page. E.g. I want to list all pages that contain "redirect[[:Category", but I don't want to modify the pages. I guess that it's possible to modify redirect.py (I don't speak python, but it shouldn't be hard) and run it with -log. But maybe there's a simpler way? Thanks in advance. -- Chris Watkins Appropedia.org - Sharing knowledge to build rich, sustainable lives. blogs.appropedia.org community.livejournal.com/appropedia identi.ca/appropedia twitter.com/appropedia

13 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l November 2010