[Pywikipedia-l] Rewrite status report and roadmap
Russell Blau
russblau at imapmail.org
Fri Oct 10 19:26:18 UTC 2008
Since I actually got a request for information about the rewrite project(!),
here's a summary of where things stand and what other developers can help
with.
For those who aren't aware, the goal of the rewrite branch is to convert the
entire bot framework to use the MediaWiki API instead of screen-scraping for
both reading from and writing to a wiki. Generally, the changes are to be
"behind the scenes," with the goal of maintaining backwards-compatibility
with the old framework as much as possible. Nonetheless, we are taking this
opportunity to clean up some warts in the old framework and add some new
capabilities, so old code won't "just run" without some conversion effort.
Why bother? Because the API is faster and more reliable than
screen-scraping, and we won't have to spend hours hunting and fixing bugs
every time the MediaWiki developers decide to change an HTML tag somewhere
in their page design. As Brion Vibber said, "Screen-scraping
constantly-changing UI is like repeatedly banging
yourself in the head with a bowling ball. It's painful and doesn't
accomplish much, but it feels SO GOOD when you stop!"
http://lists.wikimedia.org/pipermail/wikitech-l/2008-August/039076.html And
he's made it very clear that changes to the UI will be made regardless of
what effect they may have on bots.
* Where we stand
First of all, the code in the rewrite branch actually works; you can check
it out from SVN, run it, and experiment with it on the wiki of your choice.
Not all the functionality of the current framework has been replicated yet,
but you can instantiate a Site or a Page, get the page text, save the page,
and so forth. See the file 'README-conversion.txt' for a brief rundown of
how to convert from the old syntax to the new. You will need to create a
new user-config.py for the new framework, and tuck it away in a different
directory than the one you use for the old framework. (Preferably, this
should be ~/.pywikibot for Unix and similar systems, and C:\Documents and
Settings\USERNAME\Application Settings\pywikibot for Windows systems.) Set
the environment key PYWIKIBOT2_DIR to the name of this directory.
The design of the framework is based on the following layers:
- Communications (http request handling)
- Data (forming API requests and parsing the responses)
- Wiki (objects representing contents of a wiki, including Sites and Pages)
- Bot (the application programs)
Generally, each layer should only interact with the ones immediately above
and below it (although in practice there are a few exceptions).
Recently I have been working on testing the Site object's methods; this has
been exceedingly tedious but very useful, as it has uncovered a number of
bugs. I am hoping to complete this phase soon, as I find the time, then
move on to the Page object and its subclasses.
* How others can help
1. Test the new framework, and report (or, even better, fix) any bugs or
unclear documentation you find.
2. Develop and run unit tests for the Page object and its subclasses.
3. Port existing functions and methods that manipulate wiki text and return
a new text (from wikipedia.py, catlib.py, and so forth) into a new
textlib.py module.
4. Help identify any exceptions to backwards-compatibility, and if
appropriate add a new function/method to map the old framework's code to the
new one.
5. Start writing a new Bot class that can be subclassed by developers for
their bots; this should at a minimum provide the capabilities now in
wikipedia.handleArgs(), including help functionality, and the
pagegenerators.py module.
6. Identify what's missing from this list! ;)
Thanks in advance to anyone who pitches in on this project. And don't
hesitate to bother me with questions!
Russ Blau
More information about the Pywikipedia-l
mailing list