Hi All,
i met Manuel and Tommi at the Linux-Tag2010, and i'm very interested in openzim
and its tools.
Thanks to all of you for your excellent work on offlining wikipedia.
Unfortunately it is still very difficult to get an actual snap of wikidepia for
offline use w/ openzim. One reason is mentioned by Tommi conerning the need of
input modules. I think, it is a good idea to build a lib that supplies the
different funtionalities in seperated modules (as discribed by Tommi).
Tommi, do you have a description of the internal datastructure for zimwriter?
Everyone who likes to implement an input module has to adapt "his" external
format to this internal datastructure (?)
And there is an other lack:
There is only one data source for wikipedia which is of an actual state: the
dump - unfortunately in mediawiki format.
The static html export process seems to be orphaned...
As last ressort we can grab all articles online - thousand ways to Rome :-(
Has anyone ideas to solve this problem?
greetings from Berlin
christopher schnirch - csh
Tommi Mäkitalo schrieb:
Hi,
I've thought about zimwriter and also talked about it at LinuxTag a little.
The zimwriter has a internal plug in interface, which separates the source of
data from the generation of zim files. There are some implementations of the
source interface. There is the database source, which reads articles from a
database, the full text indexer, which gets the data by reading and indexing a
zimfile and one implementation, which creates a zim file from a search result.
There is still need for more implementations. We need at least a zim file
creator, which merges 2 zimfiles. Also a creator, which reads data from a
directory in the file system. This would obsolete the perl script from
Emmanuel, which writes the files to the database.
The user has to tell the zimwriter, what he wants to do. I can keep on adding
additional options to the zimwriter and also add features to the zimwriter.
Zimwriter will get bigger and bigger and gets more and more options.
My idea is to move the functionality to write a zim file to a library -
libzimwriter and write separate programs for each source implementation. So to
write a zimfile from the database, the user has to use zimwriterdb. To create a
fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of
them are quite simple programs, which just interface to the libzimwriter.
As an additional benefit we reduce the dependency from tntdb. Only zimwriterdb
needs tntdb. All other tools do not.
The only problem I see is that I have to break the current command line
interface. I feel that the price is cheap.
Any opinions?
Tommi
_______________________________________________
dev-l mailing list
dev-l(a)openzim.org
https://intern.openzim.org/mailman/listinfo/dev-l
--
christopher schnirch !!!!!Bitte Antworten nur an folgende Adresse senden:
mailto:@schnirch-berlin.de !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!