I am at a point where it would be helpful to have some feedback from other Pywikipedia users about the future of the rewrite branch. As those who watch the SVN commits know, I have not had as much time to work on this lately, and have to prioritize what time I do spend on it.
For those who have used the rewrite branch, what (if anything) needs to be done to it to get you to use it exclusively and retire the old wikipedia.py system? What is missing? What is broken? What is present but could be improved?
For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look?
And then, I'm sure there are many whose reaction to this post has been, "What's the rewrite branch?" I don't know what to ask you, so feel free to move on to the next message.
Most critically, is there any reason to continue development of the trunk once the rewrite branch is at a point where most users are ready to switch to it?
-- Russ
Russ,
On Tue, Mar 30, 2010 at 10:18 AM, Russell Blau russblau@imapmail.orgwrote:
For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look?
Last time I talked about it to NicDumZ he told me it was not fuly ready yet. I will definitely give a try after this email. Do you have a quick FAQ on how to switch my existing scripts to the rewrite branch?
Thanks,
N.
On Tue, Mar 30, 2010, at 11:15 AM, Nakor nakor.wp@gmail.com wrote:
Russ,
On Tue, Mar 30, 2010 at 10:18 AM, Russell Blau russblau@imapmail.org wrote: For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look?
Last time I talked about it to NicDumZ he told me it was not fuly ready yet. I will definitely give a try after this email. Do you have a quick FAQ on how to switch my existing scripts to the rewrite branch?
Here's a copy of rewrite/README-conversion.txt (note that the discussion about Link objects is somewhat outdated; they are still there, but it is no longer recommended to use the Page(Link(...)) format, you can just keep existing Page(site, title) calls as they are. One thing that is not mentioned is that you can copy your current user-config.py file for use with the rewrite; just make a copy and save it in the directory whose name shows up in the error message the first time you try to run a script without it. :-)
-- Russ
This is a guide to converting bot scripts from version 1 of the Pywikipediabot framework to version 2.
Most importantly, note that the version 2 framework *only* supports wikis using MediaWiki v.1.14 or higher software. If you need to access a wiki that uses older software, you should continue using version 1 for this purpose.
The root namespace used in the project has changed from "wikipedia" to "pywikibot". References to wikipedia need to be changed globally to pywikibot. Unless noted in this document, other names have not changed; for example, wikipedia.Page can be replaced by pywikibot.Page throughout any bot. An effort has been made to design the interface to be as backwards- compatible as possible, so that in most cases it should be possible to convert scripts to the new interface simply by changing import statements and doing global search-and-replace on module names, as discussed in this document.
With pywikipedia scripts were importing "wikipedia" or "pagegenerators" libraries; pywikibot is now written as a standard package, and other modules are contained within it (e.g., pywikibot.site contains Site classes). However, most commonly-used names are imported into the pywikibot namespace, so that module names don't need to be used unless specified in the documentation.
Make sure that the directory that contains the "pywikibot" subdirectory (or folder) is in sys.path.
The following changes, at a minimum, need to be made to allow scripts to run:
change "import wikipedia" to "import pywikibot" change "import pagegenerators" to "from pywikibot import pagegenerators" change "import config" to "from pywikibot import config" change "import catlib" to "from pywikibot import catlib" change "wikipedia." to "pywikibot."
wikipedia.setAction() no longer works; you must revise the script to pass an explicit edit summary message on each put() or put_async() call. /* No longer true, although it is still good practice to pass an edit summary message. */
== Python libraries ==
[Note: the goal will be to package pywikibot with setuptools easy_install, so that these dependencies will be loaded automatically when the package is installed, and users won't need to worry about this...]
To run pywikibot, you will need the httplib2, simplejson, and setuptools packages-- * httplib2 : http://code.google.com/p/httplib2/ * setuptools : http://pypi.python.org/pypi/setuptools/ * simplejson : http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7.1/docs/index.html
or, if you already have setuptools installed, just execute 'easy_install httplib2' and 'easy_install simplejson'
If you run into errors involving httplib2.urlnorm, update httplib2 to 0.4.0 (Ubuntu package python-httlib2, for example, is outdated). Note that httplib2 will run under Python 2.6, but will emit DeprecationWarnings (which are annoying but don't affect the ability to use the package).
== Page objects ==
The constructor syntax for Pages has been modified; existing calls in the format of Page(site, title) will still work, and this is still the preferred way of creating a Page object from data retrieved from the MediaWiki API (because the API will have parsed and normalized the title). However, for titles input by a user or scraped from wikitext, it is preferred to use the alternative syntax Page(Link(site, wikitext)), where "wikitext" is the string found between [[ and ]] delimiters. The new Link object (more on this below) handles link parsing and interpretation that doesn't require access to the wiki server.
A third syntax allows easy conversion from a Page object to an ImagePage or Category, or vice versa: e.g., Category(pageobj) converts a Page to a Category, as long as the page is in the category namespace.
The following methods of the Page object have been deprecated (deprecated methods still work, but print a warning message in debug mode):
- urlname(): replaced by Page.title(asUrl=True) - titleWithoutNamespace(): replaced by Page.title(withNamespace=False) - sectionFreeTitle(): replaced by Page.title(withSection=False) - aslink(): replaced by Page.title(asLink=True) - encoding(): replaced by Page.site().encoding()
The following methods of the Page object have been obsoleted and no longer work (but these methods don't appear to be used anywhere in the code distributed with the bot framework). The functionality of the two obsolete methods is easily replaced by using standard search-and-replace techniques. If you call them, they will print a warning and do nothing else:
- removeImage() - replaceImage()
=== ImagePage objects ===
For ImagePage objects, the getFileMd5Sum() method is deprecated; it is recommended to replace it with getFileSHA1Sum(), because MediaWiki now stores the SHA1 hash of images.
=== Category objects ===
The Category object has been moved from the catlib module to the pywikibot namespace. Any references to "catlib.Category" can be replaced by "pywikibot.Category", but the old form is retained for backwards-compatibility.
For Category objects, the following methods are deprecated:
- subcategoriesList: use, for example, list(self.subcategories()) instead - articlesList: use, for example, list(self.articles()) instead - supercategories: use self.categories() instead - supercategoriesList: use, for example, list(self.categories()) instead
# MORE TO COME #
On 03/30/2010 12:12 PM, Russell Blau wrote:
Here's a copy of rewrite/README-conversion.txt (note that the discussion about Link objects is somewhat outdated; they are still there, but it is no longer recommended to use the Page(Link(...)) format, you can just keep existing Page(site, title) calls as they are. One thing that is not mentioned is that you can copy your current user-config.py file for use with the rewrite; just make a copy and save it in the directory whose name shows up in the error message the first time you try to run a script without it. :-)
-- Russ
First question now that I am trying to migrate my scripts. Where is template.py? I used it to remove a certain template from a list of pages:
bot = template.TemplateRobot(generator=pagesToProcess, templates=templateKeys, subst=False, remove=True, editSummary=summary, acceptAll=True, addedCat=None) bot.run()
Thanks,
N.
On 3/30/2010 at 2:57 PM, "Nakor" nakor.wp@gmail.com wrote:
On 03/30/2010 12:12 PM, Russell Blau wrote:
Here's a copy of rewrite/README-conversion.txt (note that the discussion about Link objects is somewhat outdated; they are still there, but it is no longer recommended to use the Page(Link(...)) format, you can just keep existing Page(site, title) calls as they are. One thing that is not mentioned is that you can copy your current user-config.py file for use with the rewrite; just make a copy and save it in the directory whose name shows up in the error message the first time you try to run a script without it. :-)
-- Russ
First question now that I am trying to migrate my scripts. Where is template.py? I used it to remove a certain template from a list of pages:
I don't think it has been converted yet. :-(
Since I posted the conversion tips yesterday, I realized it was a bit out of date so I've revised it and posted it on http://www.botwiki.sno.cc/wiki/Rewrite/Conversion_HOWTO where others can add their own tips.
Russ
On Tue, Mar 30, 2010 at 4:18 PM, Russell Blau russblau@imapmail.org wrote:
I am at a point where it would be helpful to have some feedback from other Pywikipedia users about the future of the rewrite branch. As those who watch the SVN commits know, I have not had as much time to work on this lately, and have to prioritize what time I do spend on it.
For those who have used the rewrite branch, what (if anything) needs to be done to it to get you to use it exclusively and retire the old wikipedia.py system? What is missing? What is broken? What is present but could be improved?
I'm already using it, although I'm not doing bots but some kind of statistical analysis. So I use it for read only. I'm using the rewrite branch, which fits me way better than the old branch. For example, handling different Revisions of the same page is much easier in the rewrite branch. I would export more of that functionality. I use the Revision class, so I export it myself.
I wrote a couple of emails to the list recently explaining my use and various suggestions for improvement.
Right now, the thing I miss the most is a proper API for accessing XML dumps. I'm currently using a half-baked xmlreader.py based on the old one.
Best regards,
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 10-03-30 11:18 AM, Russell Blau wrote:
For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look?
I didn't know it was in a usable state.
Most critically, is there any reason to continue development of the trunk once the rewrite branch is at a point where most users are ready to switch to it?
Bugfixes for a while, then nuke it.
- -Mike
Hi Russell,
Russell Blau schreef:
I am at a point where it would be helpful to have some feedback from other Pywikipedia users about the future of the rewrite branch. As those who watch the SVN commits know, I have not had as much time to work on this lately, and have to prioritize what time I do spend on it.
For those who have used the rewrite branch, what (if anything) needs to be done to it to get you to use it exclusively and retire the old wikipedia.py system? What is missing? What is broken? What is present but could be improved?
The main reason I use the current branch and not the rewrite branch is that the current branch just works. I run bots to get stuff done and last time I tried the rewrite branch didn't work.
For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look?
If you say rewrite is stable, I'll give it another try.
And then, I'm sure there are many whose reaction to this post has been, "What's the rewrite branch?" I don't know what to ask you, so feel free to move on to the next message.
Most critically, is there any reason to continue development of the trunk once the rewrite branch is at a point where most users are ready to switch to it?
Before we get there the user programs need to be converted and restructured. I think most current commandline programs should be split up in a script part and one of more lib parts.
Maarten
-- Russ
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
some beginners questions: how to start it? where I should copy user-config.py ? what should be in this user-config.py ? how to run any script? in which directory?
JAnD
2010/4/1 Maarten Dammers maarten@mdammers.nl:
Hi Russell,
Russell Blau schreef:
I am at a point where it would be helpful to have some feedback from other Pywikipedia users about the future of the rewrite branch. As those who watch the SVN commits know, I have not had as much time to work on this lately, and have to prioritize what time I do spend on it.
For those who have used the rewrite branch, what (if anything) needs to be done to it to get you to use it exclusively and retire the old wikipedia.py system? What is missing? What is broken? What is present but could be improved?
The main reason I use the current branch and not the rewrite branch is that the current branch just works. I run bots to get stuff done and last time I tried the rewrite branch didn't work.
For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look?
If you say rewrite is stable, I'll give it another try.
And then, I'm sure there are many whose reaction to this post has been, "What's the rewrite branch?" I don't know what to ask you, so feel free to move on to the next message.
Most critically, is there any reason to continue development of the trunk once the rewrite branch is at a point where most users are ready to switch to it?
Before we get there the user programs need to be converted and restructured. I think most current commandline programs should be split up in a script part and one of more lib parts.
Maarten
-- Russ
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
"Jan Dudík" jan.dudik@gmail.com wrote:
some beginners questions: how to start it? where I should copy user-config.py ? what should be in this user-config.py ? how to run any script? in which directory?
JAnD