[Pywikipedia-l] Rewrite status report and roadmap

Russell Blau russblau at imapmail.org
Fri Oct 10 19:26:18 UTC 2008


Since I actually got a request for information about the rewrite project(!), 
here's a summary of where things stand and what other developers can help 
with.

For those who aren't aware, the goal of the rewrite branch is to convert the 
entire bot framework to use the MediaWiki API instead of screen-scraping for 
both reading from and writing to a wiki.  Generally, the changes are to be 
"behind the scenes," with the goal of maintaining backwards-compatibility 
with the old framework as much as possible.  Nonetheless, we are taking this 
opportunity to clean up some warts in the old framework and add some new 
capabilities, so old code won't "just run" without some conversion effort.

Why bother?  Because the API is faster and more reliable than 
screen-scraping, and we won't have to spend hours hunting and fixing bugs 
every time the MediaWiki developers decide to change an HTML tag somewhere 
in their page design.  As Brion Vibber said, "Screen-scraping 
constantly-changing UI is like repeatedly banging
yourself in the head with a bowling ball. It's painful and doesn't
accomplish much, but it feels SO GOOD when you stop!" 
http://lists.wikimedia.org/pipermail/wikitech-l/2008-August/039076.html  And 
he's made it very clear that changes to the UI will be made regardless of 
what effect they may have on bots.

* Where we stand

First of all, the code in the rewrite branch actually works; you can check 
it out from SVN, run it, and experiment with it on the wiki of your choice. 
Not all the functionality of the current framework has been replicated yet, 
but you can instantiate a Site or a Page, get the page text, save the page, 
and so forth.  See the file 'README-conversion.txt' for a brief rundown of 
how to convert from the old syntax to the new.  You will need to create a 
new user-config.py for the new framework, and tuck it away in a different 
directory than the one you use for the old framework.  (Preferably, this 
should be ~/.pywikibot for Unix and similar systems, and C:\Documents and 
Settings\USERNAME\Application Settings\pywikibot for Windows systems.) Set 
the environment key PYWIKIBOT2_DIR to the name of this directory.

The design of the framework is based on the following layers:

- Communications (http request handling)
- Data (forming API requests and parsing the responses)
- Wiki (objects representing contents of a wiki, including Sites and Pages)
- Bot (the application programs)

Generally, each layer should only interact with the ones immediately above 
and below it (although in practice there are a few exceptions).

Recently I have been working on testing the Site object's methods; this has 
been exceedingly tedious but very useful, as it has uncovered a number of 
bugs.  I am hoping to complete this phase soon, as I find the time, then 
move on to the Page object and its subclasses.

* How others can help

1.  Test the new framework, and report (or, even better, fix) any bugs or 
unclear documentation you find.

2.  Develop and run unit tests for the Page object and its subclasses.

3.  Port existing functions and methods that manipulate wiki text and return 
a new text (from wikipedia.py, catlib.py, and so forth) into a new 
textlib.py module.

4.  Help identify any exceptions to backwards-compatibility, and if 
appropriate add a new function/method to map the old framework's code to the 
new one.

5.  Start writing a new Bot class that can be subclassed by developers for 
their bots; this should at a minimum provide the capabilities now in 
wikipedia.handleArgs(), including help functionality, and the 
pagegenerators.py module.

6.  Identify what's missing from this list!  ;)

Thanks in advance to anyone who pitches in on this project.  And don't 
hesitate to bother me with questions!

Russ Blau




More information about the Pywikipedia-l mailing list