Hi Marc
Le mar 11/08/09 16:26, "Marc Bantle" openmoko(a)rcie.de a écrit:
> I observed that Kiwix is producing an "ad-hoc" type
> index. This may be usefull for desktops as they have
> the power to generate an index file on the fly. On
> small footprint devices this will not reasonably be
> possible, due to lacking memory and cpu resources.
Yes
> Even on a dual core desktop with 3.5 GB of memory
> Kiwix failed to produce "ad-hoc" index of the
> openzim-edition of the German Wikipedia running
> out of memory after many hours.
Yes, this is Kiwix's specific issue with really big (with a lot of text) ZIM files.
> Question 1:
> >From the change log I see that kiwix is using
> a prominent search engine (Xapian) instead of the
> mechanism ZimReader/Writer are using. Is there an
> easy way to reuse an index produced by Kiwix on
> a different machines?
Yes although this is not trivial, the Xapian database is in your ~/.www.kiwix.org directory ([md5sum].index directory)
You can copy it to every other profile/account/computer and like that Kiwix will be able to search trough a ZIM without running the indexing process. To reduce the size of the directory, you can also use "xapian-compact".
We know, they are a list of improvements to do to improve the current index management usability.
> Question 2:
> Are there plans to enable Kiwix to read reusable
> indexes of the format released for ZimReader/
> Writer?
I have nothing against to make Kiwix compatible with different search engine backends... but this is not a priority yet for me.
I think I will do it in a middle far future, as soon as I have time for that or if for any reason a user really need that.
> Question 3:
> Are there plans to enable Kiwix to produce such a
> reusable index.
Not sure to understand the question? Do you speak from the ZIM indexes ?
In this case, cf. Question2 comment.
> Question 4:
> Wouldn't it be desirable to deliver reusable indexes
> together with zim-article-databases for all those
> people with less capable devices (mids, netbooks,
> phones) on the Kiwix site?
I do not believe having only one type of search engine is good at all: usages are multiple and for this reason with have different search engines. I think the ZIM format should not forced the user to make a choice. I also think, we have to be able to spread contents without data twice (with indexes). And finaly, I do not think that having compatible indexes should be a priority because 99% of the users don't care about that (they simply use only one client).
> Question 5:
> The zim databases supplied on the Kiwix site [1]
> seem to use the articles title field as article id field,
> which - I'm sure - solves some problems for Kiwix,
> but results in a list of article ids as result of a search
> on zimreader instead of a list of article titles. Since
> both Kiwix and ZimReader are part of the openzim
> standardization effort, this confuses me a bit. Which
> format is supposed to be the standard?
Tommi already answered to that...
IMO this is the job of the indexer to find the title... if there is a HTML page with a title, it has to use it.
More globaly, IMO forcing ZIM creators with url=title is a bad idea, we never should forced (with the format) to adopt special way of representig/storing Informations. All what if possible with "normal" HTTP/HTML should be also possible with contents in a ZIM file.
Emmanuel