Hi,
I've some ideas concerning libzim, which I would like to share with you.
There is a class zim::Files, which represents a set of zim-files in one directory. I would like to drop that class, if it is not used in Kiwix. (Emmanuel: Can you verify that?)
Let me explain, why the class is there, why I would like to remove it and how to replace the functionality.
In the original german wikipedia-DVD the content of wikipedia was in one zeno- file and the word index, which is also a zeno-file was in another. Images where there in a very bad quality due to limited capacity of the DVD. There was a compact and premium edition. The premium edition had images in a much better quality on 3 additional DVDs. Each had a zeno file with part of the images. To use these better quality images, the user had to copy all zeno files into one directory on his hard drive and configure the reader to use that directory. When images were requested, the reader then searches the image in all zeno files and fetches the one with the best quality.
I feel, that this solution is not optimal. There is also a technical reason, why this is not that good and since I have a better idea (at least I think the idea is better ;-) ), I would like to drop that feature. The technical reason is simple: there is no common API to read directories. Almost all operating systems has opendir/readdir/closedir. Unfortunately reactos (and other systmes, which use win32) do not have these functions. So the zimlib has to use a different API for these systems. This is not really a big problem but still a problem. It is actually the only code, which is OS specific in zimlib (or actually in the used cxxtools, which provides a wrapper for these functions).
The functionality is not needed on the planned wikipedia DVD for the linux tag, since we won't have a premium edition but have all data in one single zim file.
In the future I plan to create a utility, which merges the content of 2 or more zim files into one. The structure of zim files makes this a quite easy operation. Much easier than creating new zim files. Especially the changes I made compared to zeno makes it very easy and fast. Combining 2 zim files is almost as cheap as copying these files. So for users instead of copying all zim files into one directory, they can just combine multiple zim files to creat one single big file.
The utility will have an option to control how to handle duplicate articles. The utility may prefer the articles of one of the files, so it will be possible to make update files, which just provides the changed files. The resulting combined file will not be exactly the same as a new file, since it will have empty blob entries for removed article data. But the user won't see any difference.
Tommi
Hi Tommi,
the update feature is exactly something which is asked for by a lot of people. Implementing that would be really great!
Concerning the zim::Files feature I want to refer to the agenda items we already planned for the 2nd Dev Meeting: the interZIM links / UIDs for ZIM files. I guess that will provide a similar functionality in a much more sophisticated way.
Today I want to stress on a few things I'd like to see on the Wikipedia DVD 2009 (LinuxTag): * category handling * image handling - will we be able to include images? * a stable reader (Kiwix and zimreader) * ports for ReactOS-compatibles
So I suggest to document the ideas for the update mechanism in the wiki and postpone it until the Wikipedia DVD is published.
Could you (both) give me a status of the items mentioned above? Where do you need help, how much is possible until June?
Greets,
Manuel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tommi Mäkitalo a écrit :
There is a class zim::Files, which represents a set of zim-files in one directory. I would like to drop that class, if it is not used in Kiwix. (Emmanuel: Can you verify that?)
I do not use zim::Files.
In the future I plan to create a utility, which merges the content of 2 or more zim files into one. The structure of zim files makes this a quite easy operation. Much easier than creating new zim files. Especially the changes I made compared to zeno makes it very easy and fast. Combining 2 zim files is almost as cheap as copying these files. So for users instead of copying all zim files into one directory, they can just combine multiple zim files to creat one single big file.
Ok for it me sounds good, but why not a method merge(path) in zim::File?
The utility will have an option to control how to handle duplicate articles. The utility may prefer the articles of one of the files, so it will be possible to make update files, which just provides the changed files. The resulting combined file will not be exactly the same as a new file, since it will have empty blob entries for removed article data. But the user won't see any difference.
I think, I will not need for the ZIM I want to build such feature... but I want to ask: why not consider additional ZIM files (with pictures for example) as an "additional pack" which can be merge with a specific ZIM file (in this case the ZIM creator has to take care about the uniqueness of URLs)?
Regards Emmanuel