Let me talk a little bit about the bundle format, briefly:
* It is intended to be a complete copy of all wiki resources required to make an offline dump, in any format. That means that all the articles are spidered and template-expanded and all related images and other media are fetched and stored in a zip archive. The archive also will contain all license and authorship information needed to make the attributions, etc, needed for a license-compliant rendering. This should provide developers of rendering backends a substantial headstart.
* The current bundle format is backwards compatible with the pediapress bundles. We have made some additions, primarily having to do with better disambiguating table keys/filenames/etc to deal with collections which span multiple wikis. We also add the parsoid parser output.
* The backwards-compatibility features are somewhat experimental. As Matthew noted, the plan is for pediapress to eventually begin hosting their bundler on their own servers. We hope that they will be able to share our bundles, but that decision is up to them. We may deprecate some of the backwards-compatibility content of the bundles (for example, removing the PHP parser output) if no one ends up using them. (None the less, having pediapress' working bundle format was very helpful to me in writing the new bundler, and I want to thank them!)
* I've made a conscious effort to support *very large* bundles in this format. That is, I try not to hold complete data relating to a bundle in memory, and we use sqlite databases wherever possible to support article-at-a-time access during rendering. The MW-hosted servers will probably have reasonably-small resource limits, but it is my intention that if you want to create an offline dump of an entire wiki (or large subset thereof), then you should be able to use the existing renderers and bundler to do so. I'd encourage people interested in making large slices to get in touch and hopefully start playing with the code, so we can identify any bundle-format related bottlenecks and eliminate them before the bundle format is too firmly established. * The bundler (and latex renderer) are independent npm modules, loosely coupled to the Collection extension. Again, this should encourage reuse of the bundler and renderer in other projects. Patches welcome!
http://git.wikimedia.org/summary/mediawiki%2Fextensions%2FCollection%2FOffli...
http://git.wikimedia.org/summary/mediawiki%2Fextensions%2FCollection%2FOffli... The npm module name is still in flux. It's currently mw-bundler and mw-latexer, maybe mw-ocg-bundler etc would be better. --scott