pywikibot September 2011

pywikibot@lists.wikimedia.org

14 participants
25 discussions

[Pywikipedia-l] Urlencoded section titles
by Bináris 13 Sep '18

13 Sep '18

Happy Monday, There are strange people who make such links (kindof urlencoded?): [[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban .28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]] So the section title must have been copied from the URL. Do we have a ready tool to fix these? -- Bináris

3 11

[Pywikipedia-l] Replace.py should save -- please comment
by Bináris 26 Feb '12

26 Feb '12

Hi! My old problem is that repalce.py can't write the pages to work on into a file on my disk. I have used a modificated version for years that does no changes but writes the title of the involved pages to a subpage on Wikipedia in automated mode, and then I can make the replacements from that page much more quickly than directly from dump or living Wikipedia. This is slow and generates a plenty of dummy edits. In other words, replace.py has a tool to get the titles from a file (-file) or from a wikipage (-links), but has no tool to generate this file. Now I am ready to rewrite it. This way we can start it and the bot will find all the possible articles to work on and save the titles without editing Wikipedia (and without artificial delay), meanwhile we can have the lunch or run a marathon or sleep. Then we make the replacements from this with -file. My idea is that replace.py should have two new parameters: -save writes the results into a new file instead of editing articles. It overwrites existing file without notice. -saveappend writes into a file or appends to the existing one. OR: -save writes and appends (primary mode) -savenew writes and overwrites The help is here: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data So we have to import codecs. My script is: articles=codecs.open('cikkek.txt','a',encoding='utf-8') ... tutuzuzu=u'# %s\n' %page.aslink() <-- needs rewrite to the new syntax articles.write(unicode(tutuzuzu)) <-- needs further testing, if nicode() is really needed articles.flush() It works fine except '\n' is a unix-styled newline that has to be converted by lfcr.py in order to make it readable with notepad.exe. This is with constant filename, that should be developed to get from command line. Your opinions before I begin? -- Bináris

5 12

[Pywikipedia-l] SourceForge passwords
by Bináris 20 Feb '12

20 Feb '12

Have you also got an e-mail with subject "SourceForge.net passwords reset"? Is that authentic? -- Bináris

10 28

[Pywikipedia-l] How to read special pages?
by Bináris 17 Feb '12

17 Feb '12

I want to read a special page with Page.get(). The message is: File "C:\Program Files\Pywikipedia\wikipedia.py", line 601, in get raise NoPage('%s is in the Special namespace!' % self.aslink()) pywikibot.exceptions.NoPage What is the solution? -- Bináris

7 21

[Pywikipedia-l] SVN access request
by Bináris 07 Feb '12

07 Feb '12

Hi, I would like to get an SVN access and some help to start. I need it mainly for inserting and maintaining TOCbot that is under preparation (it has worked in huwiki for several months and is now being internationalized). Information about TOCbot: http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:Bin%C3%A1ris/TOCbot Description, user guide and bot owners' guide and a collection of examples is ready as well as an auxilary script, while the main script is not yet public. It will soon be published for test and may need much care in the first time. I would also like to take part in maintenance of replace.py for what I worked a lot already. At the moment I am interested only in trunk version. My SF page: http://sourceforge.net/users/binbot/ -- I don't know how to list all my contributions, here appears a part of them since May 22, but there are much more. I have also been active on mailing list in the past years. Please support and give me technical help to use the system. -- Bináris

3 8

Re: [Pywikipedia-l] [Pywikipedia-svn] SVN: [9196] trunk/pywikipedia
by Merlijn van Deen 19 Jan '12

19 Jan '12

Whoo! Great work :-) Tests always are good contributions :-) On a sidenote - is there a reason you're implementing these in 'trunk' and not in 'rewrite'? Of course, these contributions are very welcome in the trunk, but I still think it would be good to push the rewrite branch. Best regards, Merlijn On 24 April 2011 07:41, <jayvdb(a)svn.wikimedia.org> wrote: > http://www.mediawiki.org/wiki/Special:Code/pywikipedia/9196 > > Revision: 9196 > Author: jayvdb > Date: 2011-04-24 05:40:59 +0000 (Sun, 24 Apr 2011) > Log Message: > ----------- > Allow lists of Page and User objects to be interogated > > Modified Paths: > -------------- > trunk/pywikipedia/query.py > trunk/pywikipedia/tests/test_query.py > > Modified: trunk/pywikipedia/query.py > =================================================================== > --- trunk/pywikipedia/query.py 2011-04-24 04:23:12 UTC (rev 9195) > +++ trunk/pywikipedia/query.py 2011-04-24 05:40:59 UTC (rev 9196) > @@ -263,10 +263,21 @@ > > encList = '' > # items may not have one symbol - '|' > - for l in list: > - if type(l) == str and u'|' in l: > - raise wikipedia.Error("item '%s' contains '|' symbol" % l ) > - encList += ToUtf8(l) + u'|' > + for item in list: > + if isinstance(item,basestring): > + if u'|' in item: > + raise wikipedia.Error(u"item '%s' contains '|' symbol" % > item ) > + encList += ToUtf8(item) + u'|' > + elif isinstance(item,wikipedia.Page): > + encList += ToUtf8(item.title()) + u'|' > + elif item.__class__.__name__ == 'User': > + # delay loading this until it is needed > + import userlib > + encList += ToUtf8(item.name()) + u'|' > + else: > + raise wikipedia.Error(u'unknown item class %s' % > item.__class__.__name__) > + > + # strip trailing '|' before returning > return encList[:-1] > > def ToUtf8(s): > > Modified: trunk/pywikipedia/tests/test_query.py > =================================================================== > --- trunk/pywikipedia/tests/test_query.py 2011-04-24 04:23:12 UTC > (rev 9195) > +++ trunk/pywikipedia/tests/test_query.py 2011-04-24 05:40:59 UTC > (rev 9196) > @@ -7,6 +7,8 @@ > import unittest > import tests.test_pywiki > > +import wikipedia as pywikibot > +import catlib, userlib > import query > > > @@ -74,5 +76,72 @@ > ]} > self.assertEqualQueryResult(params, expectedresult) > > + def test_titles_Page(self): > + params = { > + 'action': 'query', > + 'list': 'users', > + 'usprop': ['registration'], > + 'ususers': [pywikibot.Page(self.site, u'Example'), > + pywikibot.Page(self.site, u'Example2')], > + } > + expectedresult = {u'users': [ > + { > + u'userid': 215131, > + u'name': u'Example', > + u'registration': u'2005-03-19T00:17:19Z' > + }, > + { > + u'userid': 5176706, > + u'name': u'Example2', > + u'registration': u'2007-08-26T02:13:33Z' > + }, > + ]} > + self.assertEqualQueryResult(params, expectedresult) > + > + def test_titles_User(self): > + params = { > + 'action': 'query', > + 'list': 'users', > + 'usprop': ['registration'], > + 'ususers': [userlib.User(self.site, u'Example'), > + userlib.User(self.site, u'Example2')], > + } > + expectedresult = {u'users': [ > + { > + u'userid': 215131, > + u'name': u'Example', > + u'registration': u'2005-03-19T00:17:19Z' > + }, > + { > + u'userid': 5176706, > + u'name': u'Example2', > + u'registration': u'2007-08-26T02:13:33Z' > + }, > + ]} > + self.assertEqualQueryResult(params, expectedresult) > + > + def test_titles_Category(self): > + params = { > + 'action': 'query', > + 'prop': 'revisions', > + 'rvprop': ['ids', 'timestamp', 'user'], > + 'rvdir': 'newer', > + 'rvlimit': 1, > + 'titles': [catlib.Category(self.site, > u'Category:Categories')], > + } > + expectedresult = {u'pages': {u'794823': > + { > + u'ns': 14, > + u'pageid': 794823, > + u'revisions': [{ > + u'revid': 4494485, > + u'user': u'SEWilco', > + u'timestamp': u'2004-07-07T18:45:50Z', > + }], > + u'title': u'Category:Categories', > + }, > + }} > + self.assertEqualQueryResult(params, expectedresult) > + > if __name__ == "__main__": > unittest.main() > > > _______________________________________________ > Pywikipedia-svn mailing list > Pywikipedia-svn(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-svn >

8 13

[Pywikipedia-l] rewrite branch
by John Vandenberg 22 Dec '11

22 Dec '11

On Mon, Apr 25, 2011 at 7:49 AM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote: > Whoo! Great work :-) Tests always are good contributions :-) Thanks ;-) I agree. > On a sidenote - is there a reason you're implementing these in 'trunk' and > not in 'rewrite'? Of course, these contributions are very welcome in the > trunk, but I still think it would be good to push the rewrite branch. I'm working off trunk because it is trunk. I'd assumed that the rewrite branch was a single-purpose branch to rewrite something, and that it would be merged back when it is stable. Is it stable? Is there any documentation on what the plans are for the rewrite branch? Is there a roadmap to finish it? Is see now that the rewrite branch has more unit tests, but more are needed. Is there a need to create a backwards compatibility layer? Or, is everyone except me using the rewrite branch? ;-) -- John Vandenberg

3 2

[Pywikipedia-l] Please approve me for commit access
by MJB 30 Oct '11

30 Oct '11

Hello , Tim Starling told me that i can ask here to approve me for commit access, i am working always with python and as i am working in Persian community, my codes is about working on Persian Wikis you can see a summary here <http://fa.wikinews.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Mjbmrbot#Robot_T…> and i would like to work on localization and some minor edits that would be helpful, here is example of my codes : # -*- coding: utf-8 -*- import wikipedia, urllib, re, os, datetime, calendar from xml.dom import minidom from time import strftime Lang="fa" Family="wikinews" RR = (u'??????', u'?????', u'????', u'?????', u'??', u'????', u'?????', u'???', u'???????', u'?????', u'??????', u'??????') RR2 = (u'January', u'February', u'March', u'April', u'May', u'June', u'July', u'August', u'September', u'October', u'November', u'December') RRfr = (u'janvier', u'février', u'mars', u'avril', u'mai', u'juin', u'juillet', u'août', u'septembre', u'octobre', u'novembre', u'décembre') RRfr2 = (u'Janvier', u'Février', u'Mars', u'Avril', u'Mai', u'Juin', u'Juillet', u'Août', u'Septembre', u'Octobre', u'Novembre', u'Décembre') RR3 = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31) MMM = int(strftime("%m")) YYY = strftime("%Y") YYY2 = strftime("%Y") YYY = YYY.replace(u'1', u'1') YYY = YYY.replace(u'2', u'2') YYY = YYY.replace(u'3', u'3') YYY = YYY.replace(u'4', u'4') YYY = YYY.replace(u'5', u'5') YYY = YYY.replace(u'6', u'6') YYY = YYY.replace(u'7', u'7') YYY = YYY.replace(u'8', u'8') YYY = YYY.replace(u'9', u'9') YYY = YYY.replace(u'0', u'0') site = wikipedia.getSite(Lang, Family) FF = "" for i in range(1, RR3[MMM-1] + 1): site = wikipedia.getSite(Lang, Family) DD = str(i) DD2 = str(i) DDfr = str(i) if (DDfr==u'1'): DDfr = u"1er" DDfr2 = str(i) if (len(DDfr2)==1): DDfr2 = u'0' + DDfr2 DD = DD.replace(u'1', u'1') DD = DD.replace(u'2', u'2') DD = DD.replace(u'3', u'3') DD = DD.replace(u'4', u'4') DD = DD.replace(u'5', u'5') DD = DD.replace(u'6', u'6') DD = DD.replace(u'7', u'7') DD = DD.replace(u'8', u'8') DD = DD.replace(u'9', u'9') DD = DD.replace(u'0', u'0') FF = FF + u"== [[/" + DD + u'|' + DD + u' ' + RR[MMM-1] + u"]] ==\n{{/" + DD + u'}}\n\n' pagename = (u'???:' + DD + u' ' + RR[MMM-1] + u' ' + YYY) page = wikipedia.Page(site, pagename) AA = (u'[[???:' + RR[MMM-1] + u' ' + YYY + u']]\n[[???:' + DD + u' ' + RR[MMM-1] + u']]\n\n[[en:Category:' + RR2[MMM-1] + u' ' + DD2 + u', ' + YYY2 + u']]\n[[fr:Catégorie:' + DDfr + u' ' + RRfr[MMM-1] + u' ' + YYY2 + u']]') CCC = (u'????: ????? ???????? ??? ' + RR[MMM-1] + u' ' + YYY) wikipedia.output(u"Loading %s..." % pagename) try: text = page.get() except wikipedia.NoPage: print "Page doesn't exist, creating it ..." page.put(AA, comment=CCC, watchArticle = None, minorEdit = False) pagename = (u'???:' + DD + u' ' + RR[MMM-1]) page = wikipedia.Page(site, pagename) AA = (u'{{' + RR[MMM-1] + u'}}\n{{?????????|{{????????}}}}\n{{??????????-???|' + DD2 + u' ' + RR2[MMM-1] + u'}}\n\n[[en:Category:' + RR2[MMM-1] + u' ' + DD2 + u']]\n[[fr:Catégorie:' + DDfr + u' ' + RRfr[MMM-1] + u']]') CCC = (u'????: ????? ??????? ??? ' + RR[MMM-1]) wikipedia.output(u"Loading %s..." % pagename) try: text = page.get() except wikipedia.NoPage: print "Page doesn't exist, creating it ..." page.put(AA, comment=CCC, watchArticle = None, minorEdit = False) pagename = (u'????????:' + YYY + u'/' + RR[MMM-1] + u'/' + DD) page = wikipedia.Page(site, pagename) AA = (u'<onlyinclude>\n<DynamicPageList>\ncategory=????????\ncategory=' + DD + u' ' + RR[MMM-1] + u' ' + YYY + u'\nsuppresserrors=true\nstablepages=only\n</DynamicPageList>\n</onlyinclude>\n\n[[en:Wikinews:' + YYY2 + u'/' + RR2[MMM-1] + u'/' + DD2 + u']]\n[[fr:Wikinews:' + YYY2 + u'/' + RRfr[MMM-1] + u'/' + DDfr2 + u']]') CCC = (u'????: ????? ???????? ??? ' + RR[MMM-1] + u' ' + YYY) wikipedia.output(u"Loading %s..." % pagename) try: text = page.get() except wikipedia.NoPage: print "Page doesn't exist, creating it ..." page.put(AA, comment=CCC, watchArticle = None, minorEdit = False) pagename = (u'????????:' + YYY + u'/' + RR[MMM-1]) page = wikipedia.Page(site, pagename) FF = FF + u'[[???:' + RR[MMM-1] + u' ' + YYY + ']]\n\n[[en:Wikinews:' + YYY2 + u'/' + RR2[MMM-1] + u']]\n[[fr:Wikinews:' + YYY2 + u'/' + RRfr[MMM-1] + ']]' CCC = (u'????: ????? ????? ????? ?? ??? ' + RR[MMM-1] + u' ' + YYY) wikipedia.output(u"Loading %s..." % pagename) try: text = page.get() except wikipedia.NoPage: print "Page doesn't exist, creating it ..." page.put(FF, comment=CCC, watchArticle = None, minorEdit = False) pagename = (u'???:' + RR[MMM-1] + u' ' + YYY) page = wikipedia.Page(site, pagename) AA = (u'[[???:' + RR[MMM-1] + u']]\n[[???:'+ YYY + u']]\n\n[[en:Category:' + RR2[MMM-1] + u' ' + YYY2 + u']]\n[[fr:Catégorie:' + RRfr2[MMM-1] + u' ' + YYY2 + u']]') CCC = (u'????: ????? ???? ??? ' + RR[MMM-1] + u' ' + YYY) try: text = page.get() except wikipedia.NoPage: print "Page doesn't exist, creating it ..." page.put(AA, comment=CCC, watchArticle = None, minorEdit = False) pagename = (u'???:' + YYY) page = wikipedia.Page(site, pagename) AA = (u'{{???????????|' + YYY2 + u'}}\n\n[[???:????? ?? ???? ?????]]\n\n[[en:Category:' + YYY2 + u']]\n[[fr:Catégorie:' + YYY2 + u']]') CCC = (u'????: ????? ???? ??? ' + YYY) try: text = page.get() except wikipedia.NoPage: print "Page doesn't exist, creating it ..." page.put(AA, comment=CCC, watchArticle = None, minorEdit = False) This is a simple code that create date category pages and templates, my user name in wmf projects is Mjbmr and i have a bot with Mjbmrbot username that it have a global flag, I hope you accept my request, Regards -- Mjbmr

7 9

[Pywikipedia-l] Running the rewrite branch as developer/power-user [windows]
by Merlijn van Deen 08 Oct '11

08 Oct '11

Hello all, *As several people have mentioned they had trouble starting with the rewrite branch, I decided to do a step-by-step log of installing the rewrite in a way that is good for developing -- this means you are able to edit the framework files, while not inflicting any changes on other users (or other bots you run!) of the system. By using setup.py develop, edits you make to the framework will immediately be used (no need to setup.py install them), but only within the virtualenv.* *This is the windows version of my earlier email* * * I do not run python on windows, so this is a tutorial that starts with installing python. It's a bit rougher than the unix one, as I did not want to spend too much time on it. 1. Install python 2.7 http://python.org/ftp/python/2.7.1/python-2.7.1.msi<http://python.org/download/> (do *not* use the 64-bit version, due to http://bugs.python.org/issue6792 ) 2. Install Setuptools http://pypi.python.org/pypi/setuptools#files 3. Install Virtualenv start/run: cmd c:\Python27\Scripts\easy_install.exe virtualenv 4. create a virtualenv for pwb C:\Users\valhallasw>c:\Python27\Scripts\virtualenv.exe pywikibot New python executable in pywikibot\Scripts\python.exe Installing setuptools.....................done. 5. Go to C:\Users\valhallasw\pywikibot and use tortoisesvn to get the rewrite 6. create a shortcut to cmd /k c:\users\valhallasw\pywikibot\scripts\activate.bat with working path C:\Users\valhallasw\pywikibot\rewrite 7. Use the shortcut. You now have a new cmd.exe window 8. python setup.py develop Your default user directory is "C:\Users\valhallasw\AppData\Roaming\pywikibot" How to proceed? ([K]eep [c]hange) change, to c:\users\valhallasw\pywikibot\conf\ Answer 'y' to the warning prompt (not 'yes') Do you want to copy files: y [note: I copied my unix user-config.py to c:\users\valhallasw\pywikibot] Path to existing wikipedia.py? C:\Users\valhallasw\pywikibot NOTE: user-config.py already exists in the directory Create user-fixes.py file? ([y]es, [N]o) n (pywikibot) C:\Users\valhallasw\pywikibot\rewrite>echo SET PYWIKIBOT2_DIR=c:\users\valhallasw\pywikibot\conf>> ..\Scripts\activate.bat (DON'T put a space between f and >>!) Close the window, and 9. Use the shortcut from (7) again You should now have a cmd.exe with a working pywikibot setup! (pywikibot) C:\Users\valhallasw\pywikibot\rewrite\scripts>python touch.py Gebruiker:Valhallasw Retrieving 1 pages from wikipedia:nl. Page [[Gebruiker:Valhallasw]] saved NOTE: you *must* use 'python' in front of the script name, or python will not find the pywikibot directory. Good luck! Merlijn

2 1

Re: [Pywikipedia-l] [Wikitech-l] serious interwiki.py issues on MW 1.18 wikis
by Merlijn van Deen 30 Sep '11

30 Sep '11

Hi Ariel and Andre, On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel(a)wikimedia.org>wrote: > Out of curiosity... If the new revisions of one of these badly edited > pages are deleted, leaving the top revision as the one just before the > bad iw bot edit, does a rerun of the bot on the page fail? On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels <andreengels(a)gmail.com> wrote: > I deleted the page [[nl:Blankenbach]], then restored the 2 versions before > the problematic bot edit. When now I look at the page, instead of the page > content I get: (...) Using this undeleted version, and running interwiki.py, gives the expected result: valhallasw@dorthonion:~/src/pywikipedia/trunk$ python interwiki.py -page:Blankenbach NOTE: Number of pages queued is 0, trying to add 60 more. Getting 1 pages from wikipedia:nl... WARNING: Family file wikipedia contains version number 1.17wmf1, but it should be 1.18wmf1 NOTE: [[nl:Blankenbach]] does not exist. Skipping. This also happens for running it from dewiki (python interwiki.py -lang:de -page:Blankenbach%20%28Begriffskl%C3%A4rung%29) or running as 'full-auto' bot (python interwiki.py -all -async -cleanup -log -auto -ns:0 -start:Blankenbach). Special:Export acts like the page just does not exist (http://nl.wikipedia.org/w/index.php?title=Speciaal:Exporteren&useskin=monob… shows page Blanzac but not Blankenbach) api.php also more or less does the expected thing: http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Blanke… - that is, unless you supply rvlimit=1: http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Blanke… However, none of them seem to return an empty page - and playing around with pywikipediabot does not allow be to get an empty page (depending on settings, it can either be the result on the edit page (page.get(), use_api=False / screen scraping), a pywikibot.exceptions.NoPage exception (PreloadingGenerator / wikipedia.getall, which uses Special:Export) or the correct page text (page.get(), use_api=True). Anyway, thanks a huge heap for trying this (and for everyone, for thinking about it). Unfortunately, I won't have much time this weekend to debug -- hopefully some other pwb developer has. Best regards, and thanks again, Merlijn P.S. On 30 September 2011 11:12, Max Semenik <maxsem.wiki(a)gmail.com> wrote: > So you screen-scrape? No surprise it breaks. Why? For example, due to > protocol-relative URLs. Or some other changes to HTML output. Why not just > use API? No, most of pywikipedia has been adapted to the api and/or special:export, which, imo, is just an 'old' mediawiki api. Keep in mind interwiki.py is old (2003!), and pywikipedia initally was an extension of the interwiki bot. Thus, there could very well be some code that is seldom used which still uses screen scraping. And actually, in practice, screen scraping worked pretty well.

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot September 2011