Hi,
I've thought about zimwriter and also talked about it at LinuxTag a little.
The zimwriter has a internal plug in interface, which separates the source of data from the generation of zim files. There are some implementations of the source interface. There is the database source, which reads articles from a database, the full text indexer, which gets the data by reading and indexing a zimfile and one implementation, which creates a zim file from a search result.
There is still need for more implementations. We need at least a zim file creator, which merges 2 zimfiles. Also a creator, which reads data from a directory in the file system. This would obsolete the perl script from Emmanuel, which writes the files to the database.
The user has to tell the zimwriter, what he wants to do. I can keep on adding additional options to the zimwriter and also add features to the zimwriter. Zimwriter will get bigger and bigger and gets more and more options.
My idea is to move the functionality to write a zim file to a library - libzimwriter and write separate programs for each source implementation. So to write a zimfile from the database, the user has to use zimwriterdb. To create a fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of them are quite simple programs, which just interface to the libzimwriter.
As an additional benefit we reduce the dependency from tntdb. Only zimwriterdb needs tntdb. All other tools do not.
The only problem I see is that I have to break the current command line interface. I feel that the price is cheap.
Any opinions?
Tommi
Hi Tommi,
Am 19.06.2010 21:22, schrieb Tommi Mäkitalo:
The only problem I see is that I have to break the current command line interface. I feel that the price is cheap.
Any opinions?
do it ;-)
Hi All,
i met Manuel and Tommi at the Linux-Tag2010, and i'm very interested in openzim and its tools. Thanks to all of you for your excellent work on offlining wikipedia.
Unfortunately it is still very difficult to get an actual snap of wikidepia for offline use w/ openzim. One reason is mentioned by Tommi conerning the need of input modules. I think, it is a good idea to build a lib that supplies the different funtionalities in seperated modules (as discribed by Tommi).
Tommi, do you have a description of the internal datastructure for zimwriter? Everyone who likes to implement an input module has to adapt "his" external format to this internal datastructure (?)
And there is an other lack: There is only one data source for wikipedia which is of an actual state: the dump - unfortunately in mediawiki format. The static html export process seems to be orphaned... As last ressort we can grab all articles online - thousand ways to Rome :-(
Has anyone ideas to solve this problem?
greetings from Berlin christopher schnirch - csh
Tommi Mäkitalo schrieb:
Hi,
I've thought about zimwriter and also talked about it at LinuxTag a little.
The zimwriter has a internal plug in interface, which separates the source of data from the generation of zim files. There are some implementations of the source interface. There is the database source, which reads articles from a database, the full text indexer, which gets the data by reading and indexing a zimfile and one implementation, which creates a zim file from a search result.
There is still need for more implementations. We need at least a zim file creator, which merges 2 zimfiles. Also a creator, which reads data from a directory in the file system. This would obsolete the perl script from Emmanuel, which writes the files to the database.
The user has to tell the zimwriter, what he wants to do. I can keep on adding additional options to the zimwriter and also add features to the zimwriter. Zimwriter will get bigger and bigger and gets more and more options.
My idea is to move the functionality to write a zim file to a library - libzimwriter and write separate programs for each source implementation. So to write a zimfile from the database, the user has to use zimwriterdb. To create a fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of them are quite simple programs, which just interface to the libzimwriter.
As an additional benefit we reduce the dependency from tntdb. Only zimwriterdb needs tntdb. All other tools do not.
The only problem I see is that I have to break the current command line interface. I feel that the price is cheap.
Any opinions?
Tommi _______________________________________________ dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l
Hi,
Tommi, do you have a description of the internal datastructure for zimwriter? Everyone who likes to implement an input module has to adapt "his" external format to this internal datastructure (?)
the interface of input modules can be found int eh header zim/writer/articlesource.h. To implement a new module, the two interfaces must be implemented. Currently we have to tell zimwriter, when to use these interfaces, but in the future people can just write a little main-function, which just instantiates the ArticleSource-derived class and runs the writer.
When I'm ready with this, there will be at least the current functionalities as examples and I will also write a wiki article about the input module. It will be quite easy.
We have already talked several times about another input module. One which grabs the articles directly via http from a mediawiki instance. Looks like the mediawiki api is enough to do it. Mirko looked at the api already the lilnux tag but it was not quite clear, if it will work. It just looks promising.
Tommi
(sorry, but my mail client does not accept recipient addresses without a local part as you requested in your mail as a reply address)