Since I actually got a request for information about the rewrite project(!), here's a summary of where things stand and what other developers can help with.
For those who aren't aware, the goal of the rewrite branch is to convert the entire bot framework to use the MediaWiki API instead of screen-scraping for both reading from and writing to a wiki. Generally, the changes are to be "behind the scenes," with the goal of maintaining backwards-compatibility with the old framework as much as possible. Nonetheless, we are taking this opportunity to clean up some warts in the old framework and add some new capabilities, so old code won't "just run" without some conversion effort.
Why bother? Because the API is faster and more reliable than screen-scraping, and we won't have to spend hours hunting and fixing bugs every time the MediaWiki developers decide to change an HTML tag somewhere in their page design. As Brion Vibber said, "Screen-scraping constantly-changing UI is like repeatedly banging yourself in the head with a bowling ball. It's painful and doesn't accomplish much, but it feels SO GOOD when you stop!" http://lists.wikimedia.org/pipermail/wikitech-l/2008-August/039076.html And he's made it very clear that changes to the UI will be made regardless of what effect they may have on bots.
* Where we stand
First of all, the code in the rewrite branch actually works; you can check it out from SVN, run it, and experiment with it on the wiki of your choice. Not all the functionality of the current framework has been replicated yet, but you can instantiate a Site or a Page, get the page text, save the page, and so forth. See the file 'README-conversion.txt' for a brief rundown of how to convert from the old syntax to the new. You will need to create a new user-config.py for the new framework, and tuck it away in a different directory than the one you use for the old framework. (Preferably, this should be ~/.pywikibot for Unix and similar systems, and C:\Documents and Settings\USERNAME\Application Settings\pywikibot for Windows systems.) Set the environment key PYWIKIBOT2_DIR to the name of this directory.
The design of the framework is based on the following layers:
- Communications (http request handling) - Data (forming API requests and parsing the responses) - Wiki (objects representing contents of a wiki, including Sites and Pages) - Bot (the application programs)
Generally, each layer should only interact with the ones immediately above and below it (although in practice there are a few exceptions).
Recently I have been working on testing the Site object's methods; this has been exceedingly tedious but very useful, as it has uncovered a number of bugs. I am hoping to complete this phase soon, as I find the time, then move on to the Page object and its subclasses.
* How others can help
1. Test the new framework, and report (or, even better, fix) any bugs or unclear documentation you find.
2. Develop and run unit tests for the Page object and its subclasses.
3. Port existing functions and methods that manipulate wiki text and return a new text (from wikipedia.py, catlib.py, and so forth) into a new textlib.py module.
4. Help identify any exceptions to backwards-compatibility, and if appropriate add a new function/method to map the old framework's code to the new one.
5. Start writing a new Bot class that can be subclassed by developers for their bots; this should at a minimum provide the capabilities now in wikipedia.handleArgs(), including help functionality, and the pagegenerators.py module.
6. Identify what's missing from this list! ;)
Thanks in advance to anyone who pitches in on this project. And don't hesitate to bother me with questions!
Russ Blau