Revision: 4507
Author: russblau
Date: 2007-11-05 15:13:10 +0000 (Mon, 05 Nov 2007)
Log Message:
-----------
Generalize query method; add some response processing and error
handling.
Although I appreciate your efforts, I still want to ask you to wait with
any other rewrite commits until you have read, understood, and reacted to
this (fairly long) email.
The purpose of the rewrite was
a) to restructure the framework
b) to have consistent formatting and documentation
c) to move to the API
and possibly
d) to add i18n support
First of all: point b.
Before any new code is written, we should agree on a fixed coding style.
The current framework has several coding styles used throughout the
framework - and even sometimes in one class: in the Page class, we have
got Page.urlname, Page.isAutoTitle and Page.put_async... - I think we
should use one style for the entire framework.
For the coding style, I propose the following:
* As base: PEP 8 [1].
* UTF8 encoding for all files (and using u'' for all strings), with UTF8 BOM
* Docstrings mandatory, using Epydoc [2] for describing parameters and
return value. Defining the function in this way almost automatically
defines unit tests for that function.
* __version__ in every file, and setting the corresponding SVN property. *
One module per class
Note that all code should be written thread safe: we want to use
persistent HTTP connections and this means we will be sharing a connection
throughout several page objects. If we want to be able to put in a thread,
we really need to make sure code is thread safe.
Secondly: point a.
The point of the rewrite was restructuring the framework. Adding structure
after writing is much harder than first thinking of structure and then
coding along the structure.
First of all, we need to split the framework and the bots. Using the name
'pywikibot' for the framework might not be the best idea because of that
;)
Because adding structure afterwards is much harder, I think we first
should decide on what modules/classes we want, then defining what
functions we want and what these functions should do. After that, we can
start writing unit tests and functions.
Unit tests are there to help defining what code is necessary. In general,
first define what a function should to, then write unit tests, then write
code to make these tests work, and stop coding when all tests pass. If a
function needs to do more, first update the defenition, then update the
unit tests, then update the code. This makes sure a) that documentation is
always up to date with the code and b) that a change does not break
existing behavior.
For more information about unit testing and how to code using them, please
read chapter 13, 14 and 15 of Dive Into Python [3].
For unit testing, we will need one or more installs of MediaWiki; the SVN
release used at wikipedia, but maybe also older stable versions - if we
want to maintain compatibility.
On a compatibility note; even though we are changing the way the framework
works, it should be possible to create a layer that maps functions in the
old framework to the new framework.
thirdly: point c.
This really is not a seperate point, but it is important. I think we
should use the API where possible, but having a fallback to the monobook
html parser.
The API allows neat ways of getting information from many pages at once.
We will need a page generator that can handle this information, and that
will generate page objects according to the information recieved. This
will mean we need more data on what information we have got already etc.
finally: point d.
This is not a very important point, but it's kinda interesting. With the
new framework, it's easier to restructure existing functions, making
translations easier.
I'm sorry for the long email, and I repeat: I really appreciate your
efforts, but I really think we should address these issues before starting
at - even the most basic - code.
--valhallasw
[1]
http://www.python.org/dev/peps/pep-0008/
[2]
http://epydoc.sourceforge.net/manual-epytext.html
[3]
http://diveintopython.org/unit_testing/index.html