On 15 December 2010 20:24, Manuel Schneider
<manuel.schneider(a)wikimedia.ch> wrote:
Hi Andrew,
maybe you'd like to check out ZIM: This is an standardized file format
for compressed HTML dumps, focused on Wikimedia content at the moment.
There is some C++ code around to read and write ZIM files and there are
several projects using that, eg. the WP1.0 project, the Israeli and
Kenyan Wikipedia Offline initiatives and more. Also the Wikimedia
Foundation is currently in progress to adopt the format to provide ZIM
files from Wikimedia wikis in the future.
This is very interesting and I'll be watching it. Where do the HTML
dumps come from? I'm pretty sure I've only seen "static" for Wikipedia
and not for Wiktionary for example. I am also looking at adapting the
parser for offline use to generate HTML from the dump file wikitext.
Andrew Dunbar (hippietrail)
http://openzim.org/
/Manuel
Am 15.12.2010 16:21, schrieb Andrew Dunbar:
I've long been interested in offline tools
that make use of WikiMedia
information, particularly the English Wiktionary.
I've recently come across a tool which can provide random access to a
bzip2 archive without decompressing it and I would like to make use of
it in my tools but I can't get it to compile and/or function with any
free Windows compiler I have access to. It works fine on the *nix
boxes I have tried but my personal machine is a Windows XP netbook.
The tool is "seek-bzip2" by James Taylor and is available here:
http://bitbucket.org/james_taylor/seek-bzip2
* The free Borland compiler won't compile it due to missing (Unix?) header files
* lcc compiles it but it always fails with error "unexpected EOF"
* mingw compiles it if the -m64 option is removed from the Makefile
but it then has the same behaviour as the lcc build.
My C experience is now quite stale and my 64-bit programming
experience negligible.
(I'm also interested in hearing from other people working on offline
tools for dump files, wikitext parsing, or Wiktionary)
Andrew Dunbar (hippietrail)
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l