Hi there,
I'm currently using the rewrite branch for a project. This project is not a bot, but a tool for vandalism analysis.
Here I'll explain how I used it and what changes I made, so it may be useful for the new design of the rewrite. Also, I'd like to get recommendations about my approaches so I can made them suitable for integration with pywikipedia.
First of all, my main unit of information is Edit. An Edit is an object composed of a Page and two consecutive revision IDs of such page. Edit supports some operations such as getting the edition comment, user, timestamp and the old and the new text.
I had to implement a method similar to BaseSite.loadrevisions(): Given a list of edits, which have associated their revision IDs but NOT their Page, fetch them and associate them with their Page object. This method retrieves all the revisions, creates Page objects for them and Revision objects which are assigned to the corresponding Page._revisions dict.
Then, I have to store all this info in-disk for later use. So I wrote a function for exporting my list edits to XML, using WikiMedia's format Export 0.4. To ease this process, I added a to_element() method to Page and Revision objects. to_element() returns an Element object (from the ElementTree API) representing the object. So, exporting is as easy as iterating over all Pages, calling their to_element() method() and appending it to a common root. What do you think about this? Should it be included in pywikipedia? Do you prefer a different approach for exporting to XML?
For importing again from XML, I adapted the old XmlDump. My version yields Page objects instead of revisions. Of course this might be a performance nightmare when working with XML dumps with full history, so it can be modified to yield Revision objects.
I think the Revision class should include a page attribute, containing the Page object that the Revision belongs to. That would be of use, for example, when writing an XmlDump yielding Revisions and, in general, for more applications that are Revision oriented.
And last but not least, currently it's easy to end up with multiple Page objects representing the same page, but with different object state. Do you think that BaseSite should implement a Page factory or some way to "create a Page object for this title if it doesn't exist or give me the one that already exists"?
Well, that's all at the moment.
Best regards,
pywikipedia-l@lists.wikimedia.org