Hi Marc Le mar 11/08/09 16:26, "Marc Bantle" openmoko@rcie.de a écrit:
I observed that Kiwix is producing an "ad-hoc" type index. This may be usefull for desktops as they have the power to generate an index file on the fly. On small footprint devices this will not reasonably be possible, due to lacking memory and cpu resources.
Yes
Even on a dual core desktop with 3.5 GB of memory Kiwix failed to produce "ad-hoc" index of the openzim-edition of the German Wikipedia running out of memory after many hours.
Yes, this is Kiwix's specific issue with really big (with a lot of text) ZIM files.
Question 1:
From the change log I see that kiwix is using
a prominent search engine (Xapian) instead of the mechanism ZimReader/Writer are using. Is there an easy way to reuse an index produced by Kiwix on a different machines?
Yes although this is not trivial, the Xapian database is in your ~/.www.kiwix.org directory ([md5sum].index directory) You can copy it to every other profile/account/computer and like that Kiwix will be able to search trough a ZIM without running the indexing process. To reduce the size of the directory, you can also use "xapian-compact".
We know, they are a list of improvements to do to improve the current index management usability.
Question 2: Are there plans to enable Kiwix to read reusable indexes of the format released for ZimReader/ Writer?
I have nothing against to make Kiwix compatible with different search engine backends... but this is not a priority yet for me. I think I will do it in a middle far future, as soon as I have time for that or if for any reason a user really need that.
Question 3: Are there plans to enable Kiwix to produce such a reusable index.
Not sure to understand the question? Do you speak from the ZIM indexes ? In this case, cf. Question2 comment.
Question 4: Wouldn't it be desirable to deliver reusable indexes together with zim-article-databases for all those people with less capable devices (mids, netbooks, phones) on the Kiwix site?
I do not believe having only one type of search engine is good at all: usages are multiple and for this reason with have different search engines. I think the ZIM format should not forced the user to make a choice. I also think, we have to be able to spread contents without data twice (with indexes). And finaly, I do not think that having compatible indexes should be a priority because 99% of the users don't care about that (they simply use only one client).
Question 5: The zim databases supplied on the Kiwix site [1] seem to use the articles title field as article id field, which - I'm sure - solves some problems for Kiwix, but results in a list of article ids as result of a search on zimreader instead of a list of article titles. Since both Kiwix and ZimReader are part of the openzim standardization effort, this confuses me a bit. Which format is supposed to be the standard?
Tommi already answered to that...
IMO this is the job of the indexer to find the title... if there is a HTML page with a title, it has to use it.
More globaly, IMO forcing ZIM creators with url=title is a bad idea, we never should forced (with the format) to adopt special way of representig/storing Informations. All what if possible with "normal" HTTP/HTML should be also possible with contents in a ZIM file.
Emmanuel