River Tarnell wrote:
it seems more useful to provide the text in uncompressed form, instead of the MediaWiki internal form that's almost impossible to work with. does that seem reasonable?
The tools should get the text in uncompressed form. The interface to do that is not so important. Given the amount of text, I don't think storing text with some kind of compression is something to discard right away.
A common data access interface would be interesting. Perhaps as a C library to link, include as php extension... Then implement it for different sources: -Toolserver text replication -WikiProxy -Mysql mediawiki database -Mediawiki API -XML dump
Then applications just need to be designed for the text interface, debugged with a local install, tested with a small dump, deployed on toolserver...