On 15 December 2010 20:24, Manuel Schneider manuel.schneider@wikimedia.ch wrote:
Hi Andrew,
maybe you'd like to check out ZIM: This is an standardized file format for compressed HTML dumps, focused on Wikimedia content at the moment.
There is some C++ code around to read and write ZIM files and there are several projects using that, eg. the WP1.0 project, the Israeli and Kenyan Wikipedia Offline initiatives and more. Also the Wikimedia Foundation is currently in progress to adopt the format to provide ZIM files from Wikimedia wikis in the future.
This is very interesting and I'll be watching it. Where do the HTML dumps come from? I'm pretty sure I've only seen "static" for Wikipedia and not for Wiktionary for example. I am also looking at adapting the parser for offline use to generate HTML from the dump file wikitext.
Andrew Dunbar (hippietrail)
/Manuel
Am 15.12.2010 16:21, schrieb Andrew Dunbar:
I've long been interested in offline tools that make use of WikiMedia information, particularly the English Wiktionary.
I've recently come across a tool which can provide random access to a bzip2 archive without decompressing it and I would like to make use of it in my tools but I can't get it to compile and/or function with any free Windows compiler I have access to. It works fine on the *nix boxes I have tried but my personal machine is a Windows XP netbook.
The tool is "seek-bzip2" by James Taylor and is available here: http://bitbucket.org/james_taylor/seek-bzip2
- The free Borland compiler won't compile it due to missing (Unix?) header files
- lcc compiles it but it always fails with error "unexpected EOF"
- mingw compiles it if the -m64 option is removed from the Makefile
but it then has the same behaviour as the lcc build.
My C experience is now quite stale and my 64-bit programming experience negligible.
(I'm also interested in hearing from other people working on offline tools for dump files, wikitext parsing, or Wiktionary)
Andrew Dunbar (hippietrail)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Regards Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l