looks like I have to clearify some things about windows compatiblity of
Zimlib is developed on linux but not for linux. It is as far as possible
platform indipendent. To compile zimlib you first need to compile cxxtools.
Both use autoconf and automake as the build system. This is a platform
indipendend build system. The only prerequisites you need is a (bourne-
compatible) shell and some simple tools like awk, sed and make. Cxxtools
requires a c++-compiler. Zimlib depends on libz and libbz2. All of these are
easily installable by your package manager of your operating system.
As you know, windows does not fulfil these simple requirements. It does not
have a shell, nor sed or awk. It even does not have a C++-compiler or a
package manager. So it is not that easy to just "emerge libz" or "apt-get
install zlib-dev" or whatever. This is the main problem in porting zimlib to
windows. You have to generate your build stuff on your own and manually install
zlib, libbz2 and a c++-compiler. All this is freely availabe.
Ok - there is at least 2 things which won't work out of the box on windows.
These are libiconv, which is needed by cxxtools and used in zimreader and
The libiconv stuff is not needed for zimlib. This can be excluded from the
build of cxxtools if you write your build system.
Opendir/readdir/closedir is abstracted away by cxxtools. We just need a
Fortunately this opendir/readdir/closedir-stuff is adopted from pt-framework,
which has this windows implementation. It is just a matter of adopt this into
Mutexes were used in libzeno but are not any more used in zimlib.
There may be some other problems, which need to be resolved, but I don't think
there are real showstoppers. The work just need to be done. Nothing really
The patches I got from Guillaume mainly copied the pt-framework-directory-stuff
into cxxtools. The other thing was to remove the mutex stuff from libzeno. This
was really not much work. The main problem is to set up the build environment.
This was not done. A Makefile for windows would be helpful, but I haven't got
one from Guillaume. Or a document "howto setup my windows environment to
compile cxxtools and libzeno".
I would like to inform you, that I have reached a major milestone with the new
zim format: I created successfully a zim file and read it with zimDump.
The changes are:
* rewritten large parts
* updated the zim file format
* redesigned zimwriter
Let me say some words about these changes and why I did this.
* Rewritten large parts:
Rewrite helped me to improve code quality. With my knowledge of today and my
experience with the zeno file format, it was possible to clean up the library
* Updated the zim file format:
Since we decided to leave the compatibility I rethought some parts of the zeno
file format. The zeno file format did not support clustering of articles to get
better compression. I did a minor change and added a offset and size to the
directory entry of the article. The offset to the data blob was left in the
article. But now multiple articles pointed to the same blob. In the new format
I added another datastructure: the chunk, which is a collection of blobs. We
have a pointerlist similar to the directory pointer list, which points to the
chunks. The article addresses his blob by chunk number and blob number. Also
redirect entries do not need these pointers at all. I just skipped them. This
saves some bytes for each redirect.
* Redesign zimwriter:
Now the source of articles is abstracted from the generator. Also the database
is not used any more for temporary data. The writer builds the directory
entries in memory and uses a temporary file to collect the compressed data.
This will improve performance significantly. The cavet is, that more RAM is
used, but I estimated, that we have enough even for very large zim files.
The abstraction of data source gives us the opportunity implement other
sources easier, e.g. read data from the file system or wikipedia dumps without
using the database at all.
I hope this will motivate you to go on dumping data, so that we soon can start
There is still quite some work to do for me. I need to make the zimreader
working again. And the next big task is the full text index. My plan is to
read the data from zim files directly and add the full text index to the zim
files in a separate step or optionall generate a separate zim file for the index
as it was done with the german wikipedia DVD.
I'm thinking about the layout of pages in zim files. I have some ideas, what to
do and I would like to share thes with you. Especially I would like to hear
your expectations about the content of zim files.
Let me explain the problem. In the german wikipedia DVD the layout of the page
was partly hardcoded into the reader. The html-frame, css, images and
'-', these files were accessable. E.g. "/-/monobookde.css" loads the css file
from the application.
It is easy to move these into the zim file.
The html-frame is more difficult, since it contains dynamic parts like
"<title>...</title>" or the actual article text. So to move this page into the
zim file, we need some placeholders, which need to be parsed at runtime.
My plan is to introduce a special syntax for these placeholders. The tag
"<%something%>" may be replaced. This "something" need to be defined. This
syntax may be used in the layout page, which is already in the zim file header
as well as in arbitrary pages. We might also add a special mime type in
addition to zimMimeTextHtml, where the zimlib parses these tags.
This "something" may be:
<%title%> title of the page
<%url%> the url of the page (e.g. /A/Linux)
<%namespace%> the namespace
<%/A/Linux%> insert another article here
<%content%> placeholder for the article content in the layout page
... (maybe more in the future)
In the zim lib we have a class "zim::Article". This has a method "getData()",
which returns the article data of the page. I would add a new method
"getPage()", which uses the layout page to return the complete page.
This layout page should only be used, when the mime type is zimMimeTextHtml.
This way the creator of the zim file can specify, how to show pages without
repeating the html-header and footer on each page. A reader may ignore this
layout if wanted.
Dear openZIM developers team!
As I wrote last week I was at the Wikimedia Conference in Berlin.
== Wikimedia Israel / hebrew Wikipedia on OLPC ==
One result was that I meet Asaf Bartov from Israel who is working on hebrew
Wikipedia for the One Laptop Per Child project (OLPC, XO). I haven't heard
from him since I came back, but I am pretty sure we will hear from him after
the Easter holidays. I keep him up-to-date by forwarding the most relevant
mails from this mailinglist.
== Wikimedia Italia / italian Wikipedia on DVD ==
I also talked to Frieda Brioschi from Wikimedia Italia concerning their
Wikipedia DVD. They have a company in Italy which created a proprietary
software with GUI for the DVD and an "unknown" storage for the data (reminds
me of the Directmedia approach in the first place). She said the software
was "awkward". They plan to have a new DVD this year and the company promised
to write a new software which is better. Frieda stated that WMIT is not yet
sure if they will sign a contract with this company again.
After talking about openZIM Frieda said that she will take that into account.
I offered her that WMIT can make their DVD on its own by using our software
and maybe Kiwix, or they point the company to our software which they can use
and integrate zimlib or she may also ask Emmanuel - maybe he would be willing
to make a DVD for WMIT.
== Wikimedia Polska / polish Wikipedia on DVD ==
Wikimedia Polska had a DVD using HTML dumps and a Java applet as search
engine. Didn't sell, no current plans on having a new DVD.
== Wikimedia Foundation Conference Call ==
Erik Möller (Deputy Director of Wikimedia Foundation) contacted me after the
conference by mail, stating that he heard from openZIM and would like to talk
to me about that.
"Since it seems like prior offline reader efforts are
merging into it, I would like to understand the goals and deliverables
of the project better, and also discuss whether we could support it in
We have now fixed a date with James Owen, his assistant, for a conference call
at Thursday evening, April 16th at 20:00 our local time (CEST).
I would like to collect some statements and ideas how Wikimedia Foundation
could help us.
To do that in a collaborative way I created
and will now start filling in what I have in mind, but invite you to add what
you feel is important to mention from your point of view.
Thanks for your attention,
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
the ZimReader is fixed and I can browse through the zim file I created from
Josch's dewiki-Dump. There were some smaller bugs in the reader as well as in
the library, but the main bug was in cxxtools. So if you want to test the
reader please update cxxtools.
There are still some tasks to do in the reader. There is many hardcoded stuff
from the old wikipedida DVD, like the text "DVD-ROM-Ausgabe 2007" in the title
area and a reference to Directmedia.
I also updated the status and next steps page in our wiki.