New subject: [openZIM dev-l] Kiwix and ZimReader/Writer Index Format

12 Aug 2009


      Hi Marc
 Le mar 11/08/09 16:26, "Marc Bantle" openmoko@rcie.de a écrit:
...
I observed that Kiwix is producing an "ad-hoc" type
index. This may be usefull for desktops as they have
the power to generate an index file on the fly. On
small footprint devices this will not reasonably be
possible, due to lacking memory and cpu resources.
Yes
...
Even on a dual core desktop with 3.5 GB of memory
Kiwix failed to produce "ad-hoc" index of the
openzim-edition of the German Wikipedia running
out of memory after many hours.
Yes, this is Kiwix's specific issue with really big (with a lot of text) ZIM files.
...
Question 1:
...
From the change log I see that kiwix is using
a prominent search engine (Xapian) instead of the
mechanism ZimReader/Writer are using. Is there an
easy way to reuse an index produced by Kiwix on
a different machines?
Yes although this is not trivial, the Xapian database is in your ~/.www.kiwix.org directory ([md5sum].index directory)
You can copy it to every other profile/account/computer and like that Kiwix will be able to search trough a ZIM without running the indexing process. To reduce the size of the directory, you can also use "xapian-compact".
We know, they are a list of improvements to do to improve the current index management usability.
...
Question 2:
Are there plans to enable Kiwix to read reusable
indexes of the format released for ZimReader/
Writer?
I have nothing against to make Kiwix compatible with different search engine backends... but this is not a priority yet for me.
I think I will do it in a middle far future, as soon as I have time for that or if for any reason a user really need that.
...
Question 3:
Are there plans to enable Kiwix to produce such a
reusable index.
Not sure to understand the question? Do you speak from the ZIM indexes ?
In this case, cf. Question2 comment.
...
Question 4:
Wouldn't it be desirable to deliver reusable indexes
together with zim-article-databases for all those
people with less capable devices (mids, netbooks,
phones) on the Kiwix site?
I do not believe having only one type of search engine is good at all: usages are multiple and for this reason with have different search engines. I think the ZIM format should not forced the user to make a choice. I also think, we have to be able to spread contents without data twice (with indexes). And finaly, I do not think that having compatible indexes should be a priority because 99% of the users don't care about that (they simply use only one client).
...
Question 5:
The zim databases supplied on the Kiwix site [1]
seem to use the articles title field as article id field,
which - I'm sure - solves some problems for Kiwix,
but results in a list of article ids as result of a search
on zimreader instead of a list of article titles. Since
both Kiwix and ZimReader are part of the openzim
standardization effort, this confuses me a bit. Which
format is supposed to be the standard?
Tommi already answered to that...
IMO this is the job of the indexer to find the title... if there is a HTML page with a title, it has to use it.
More globaly, IMO forcing ZIM creators with url=title is a bad idea, we never should forced (with the format) to adopt special way of representig/storing Informations. All what if possible with "normal" HTTP/HTML should be also possible with contents in a ZIM file.
Emmanuel

Re: [openZIM dev-l] Kiwix and ZimReader/Writer Index Format