Revision: 4507 Author: russblau Date: 2007-11-05 15:13:10 +0000 (Mon, 05 Nov 2007)
Log Message:
Generalize query method; add some response processing and error
handling.
Although I appreciate your efforts, I still want to ask you to wait with any other rewrite commits until you have read, understood, and reacted to this (fairly long) email.
The purpose of the rewrite was a) to restructure the framework b) to have consistent formatting and documentation c) to move to the API and possibly d) to add i18n support
First of all: point b. Before any new code is written, we should agree on a fixed coding style. The current framework has several coding styles used throughout the framework - and even sometimes in one class: in the Page class, we have got Page.urlname, Page.isAutoTitle and Page.put_async... - I think we should use one style for the entire framework.
For the coding style, I propose the following: * As base: PEP 8 [1]. * UTF8 encoding for all files (and using u'' for all strings), with UTF8 BOM * Docstrings mandatory, using Epydoc [2] for describing parameters and return value. Defining the function in this way almost automatically defines unit tests for that function. * __version__ in every file, and setting the corresponding SVN property. * One module per class
Note that all code should be written thread safe: we want to use persistent HTTP connections and this means we will be sharing a connection throughout several page objects. If we want to be able to put in a thread, we really need to make sure code is thread safe.
Secondly: point a. The point of the rewrite was restructuring the framework. Adding structure after writing is much harder than first thinking of structure and then coding along the structure. First of all, we need to split the framework and the bots. Using the name 'pywikibot' for the framework might not be the best idea because of that ;)
Because adding structure afterwards is much harder, I think we first should decide on what modules/classes we want, then defining what functions we want and what these functions should do. After that, we can start writing unit tests and functions. Unit tests are there to help defining what code is necessary. In general, first define what a function should to, then write unit tests, then write code to make these tests work, and stop coding when all tests pass. If a function needs to do more, first update the defenition, then update the unit tests, then update the code. This makes sure a) that documentation is always up to date with the code and b) that a change does not break existing behavior. For more information about unit testing and how to code using them, please read chapter 13, 14 and 15 of Dive Into Python [3]. For unit testing, we will need one or more installs of MediaWiki; the SVN release used at wikipedia, but maybe also older stable versions - if we want to maintain compatibility. On a compatibility note; even though we are changing the way the framework works, it should be possible to create a layer that maps functions in the old framework to the new framework.
thirdly: point c. This really is not a seperate point, but it is important. I think we should use the API where possible, but having a fallback to the monobook html parser. The API allows neat ways of getting information from many pages at once. We will need a page generator that can handle this information, and that will generate page objects according to the information recieved. This will mean we need more data on what information we have got already etc.
finally: point d. This is not a very important point, but it's kinda interesting. With the new framework, it's easier to restructure existing functions, making translations easier.
I'm sorry for the long email, and I repeat: I really appreciate your efforts, but I really think we should address these issues before starting at - even the most basic - code.
--valhallasw
[1] http://www.python.org/dev/peps/pep-0008/ [2] http://epydoc.sourceforge.net/manual-epytext.html [3] http://diveintopython.org/unit_testing/index.html