On Thu, 16 Dec 2010 07:50:56 +0200, Andrew Dunbar
<hippytrail(a)gmail.com> wrote:
On 15 December 2010 20:24, Manuel Schneider
<manuel.schneider(a)wikimedia.ch> wrote:
Hi Andrew,
maybe you'd like to check out ZIM: This is an standardized file
format
for compressed HTML dumps, focused on Wikimedia content at the
moment.
There is some C++ code around to read and write ZIM files and there
are
several projects using that, eg. the WP1.0 project, the Israeli and
Kenyan Wikipedia Offline initiatives and more. Also the Wikimedia
Foundation is currently in progress to adopt the format to provide
ZIM
files from Wikimedia wikis in the future.
This is very interesting and I'll be watching it. Where do the HTML
dumps come from?
I do the HDML dumps on my own, using a customed version of the dumpHTML
extension and additional scripts.
Emmanuel