Hi All,
i met Manuel and Tommi at the Linux-Tag2010, and i'm very interested in openzim and its tools. Thanks to all of you for your excellent work on offlining wikipedia.
Unfortunately it is still very difficult to get an actual snap of wikidepia for offline use w/ openzim. One reason is mentioned by Tommi conerning the need of input modules. I think, it is a good idea to build a lib that supplies the different funtionalities in seperated modules (as discribed by Tommi).
Tommi, do you have a description of the internal datastructure for zimwriter? Everyone who likes to implement an input module has to adapt "his" external format to this internal datastructure (?)
And there is an other lack: There is only one data source for wikipedia which is of an actual state: the dump - unfortunately in mediawiki format. The static html export process seems to be orphaned... As last ressort we can grab all articles online - thousand ways to Rome :-(
Has anyone ideas to solve this problem?
greetings from Berlin christopher schnirch - csh
Tommi Mäkitalo schrieb:
Hi,
I've thought about zimwriter and also talked about it at LinuxTag a little.
The zimwriter has a internal plug in interface, which separates the source of data from the generation of zim files. There are some implementations of the source interface. There is the database source, which reads articles from a database, the full text indexer, which gets the data by reading and indexing a zimfile and one implementation, which creates a zim file from a search result.
There is still need for more implementations. We need at least a zim file creator, which merges 2 zimfiles. Also a creator, which reads data from a directory in the file system. This would obsolete the perl script from Emmanuel, which writes the files to the database.
The user has to tell the zimwriter, what he wants to do. I can keep on adding additional options to the zimwriter and also add features to the zimwriter. Zimwriter will get bigger and bigger and gets more and more options.
My idea is to move the functionality to write a zim file to a library - libzimwriter and write separate programs for each source implementation. So to write a zimfile from the database, the user has to use zimwriterdb. To create a fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of them are quite simple programs, which just interface to the libzimwriter.
As an additional benefit we reduce the dependency from tntdb. Only zimwriterdb needs tntdb. All other tools do not.
The only problem I see is that I have to break the current command line interface. I feel that the price is cheap.
Any opinions?
Tommi _______________________________________________ dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l