On Wed, 19 Jan 2011 03:09:34 -0800, Tomasz Finc tfinc@wikimedia.org wrote:
I had a discussion with Roan concerning "wikizim", the tool that creates a ZIM file from a whole wiki by using the API. His idea was to integrate the parser and the relevant wikizim code to make a dumping tool with less overhead, suitable for the Wikimedia static dumps. I have pointed him to the relevant code in the openZIM SVN, but the discussion has been interrupted by the MW 1.17 release he had to work on.
Yeah, Roan is going to have zero to no time for working on that. I have a couple of other people in mind at WMF who could help but if there are any community devs that want to help out then I'd love to get them started.
Having a tool building a ZIM using the Mediawiki API is essential for people not having access to the system running Mediawiki.
For people with access to the system. using the API to get the content is not the best solution because it impeaches or makes difficult any post-treatments. In addition, this is not the best approach to transmit all over HTTP if you have access to the content directly.
So, I think: * A zimwritermw (zimwriter using the MW API) console tool should be developed (Tommi already started). * Pediapress ZIM solution should be based on it ; otherwise I do not see how we will avoid duplicated work. * zimwriterdisk (zimwriter using static HTML&Media files) should be continued (Tommi already has a stub I think) * MW DumpHTML extension should be revisited to work correctly, with zimwriterdisk and potential additional post-treatment tools, this could be really powerful.
All this 4 tools separately are pretty simple to code and with all of them we might do everything we need.
So, I feel me concerned that neither DumpHTML nor zimwriterdisk are part of the discussions.
Cheers Emmanuel