Hi,
I've some ideas concerning libzim, which I would like to share with you.
There is a class zim::Files, which represents a set of zim-files in one
directory. I would like to drop that class, if it is not used in Kiwix.
(Emmanuel: Can you verify that?)
Let me explain, why the class is there, why I would like to remove it and how
to replace the functionality.
In the original german wikipedia-DVD the content of wikipedia was in one zeno-
file and the word index, which is also a zeno-file was in another. Images where
there in a very bad quality due to limited capacity of the DVD. There was a
compact and premium edition. The premium edition had images in a much better
quality on 3 additional DVDs. Each had a zeno file with part of the images. To
use these better quality images, the user had to copy all zeno files into one
directory on his hard drive and configure the reader to use that directory.
When images were requested, the reader then searches the image in all zeno
files and fetches the one with the best quality.
I feel, that this solution is not optimal. There is also a technical reason,
why this is not that good and since I have a better idea (at least I think the
idea is better ;-) ), I would like to drop that feature. The technical reason
is simple: there is no common API to read directories. Almost all operating
systems has opendir/readdir/closedir. Unfortunately reactos (and other
systmes, which use win32) do not have these functions. So the zimlib has to
use a different API for these systems. This is not really a big problem but
still a problem. It is actually the only code, which is OS specific in zimlib
(or actually in the used cxxtools, which provides a wrapper for these
functions).
The functionality is not needed on the planned wikipedia DVD for the linux
tag, since we won't have a premium edition but have all data in one single zim
file.
In the future I plan to create a utility, which merges the content of 2 or
more zim files into one. The structure of zim files makes this a quite easy
operation. Much easier than creating new zim files. Especially the changes I
made compared to zeno makes it very easy and fast. Combining 2 zim files is
almost as cheap as copying these files. So for users instead of copying all zim
files into one directory, they can just combine multiple zim files to creat one
single big file.
The utility will have an option to control how to handle duplicate articles.
The utility may prefer the articles of one of the files, so it will be possible
to make update files, which just provides the changed files. The resulting
combined file will not be exactly the same as a new file, since it will have
empty blob entries for removed article data. But the user won't see any
difference.
Tommi