New subject: [openZIM dev-l] Kiwix and ZimReader/Writer Index Format

12 Aug 2009

Hi Marc
 Le mar 11/08/09 16:26, "Marc Bantle" openmoko(a)rcie.de a écrit:
...
  I observed that Kiwix is producing an
"ad-hoc" type
 index. This may be usefull for desktops as they have
 the power to generate an index file on the fly. On
 small footprint devices this will not reasonably be
 possible, due to lacking memory and cpu resources. 
Yes

...
  Even on a dual core desktop with 3.5 GB of memory
 Kiwix failed to produce "ad-hoc" index of the
 openzim-edition of the German Wikipedia running
 out of memory after many hours. 
Yes, this is Kiwix's specific issue with really big (with a lot of text) ZIM files.

...
  Question 1:
 From the change log I see that kiwix is using
 a prominent search engine (Xapian) instead of the
 mechanism ZimReader/Writer are using. Is there an
 easy way to reuse an index produced by Kiwix on
 a different machines? 
Yes although this is not trivial, the Xapian database is in your ~/.www.kiwix.org
directory ([md5sum].index directory)
You can copy it to every other profile/account/computer and like that Kiwix will be able
to search trough a ZIM without running the indexing process. To reduce the size of the
directory, you can also use "xapian-compact".

We know, they are a list of improvements to do to improve the current index management
usability.

...
  Question 2:
 Are there plans to enable Kiwix to read reusable
 indexes of the format released for ZimReader/
 Writer? 
I have nothing against to make Kiwix compatible with different search engine backends...
but this is not a priority yet for me.
I think I will do it in a middle far future, as soon as I have time for that or if for any
reason a user really need that.

...
  Question 3:
 Are there plans to enable Kiwix to produce such a
 reusable index. 
Not sure to understand the question? Do you speak from the ZIM indexes ?
In this case, cf. Question2 comment.

...
  Question 4:
 Wouldn't it be desirable to deliver reusable indexes
 together with zim-article-databases for all those
 people with less capable devices (mids, netbooks,
 phones) on the Kiwix site? 
I do not believe having only one type of search engine is good at all: usages are multiple
and for this reason with have different search engines. I think the ZIM format should not
forced the user to make a choice. I also think, we have to be able to spread contents
without data twice (with indexes). And finaly, I do not think that having compatible
indexes should be a priority because 99% of the users don't care about that (they
simply use only one client).

...
  Question 5:
 The zim databases supplied on the Kiwix site [1]
 seem to use the articles title field as article id field,
 which - I'm sure - solves some problems for Kiwix,
 but results in a list of article ids as result of a search
 on zimreader instead of a list of article titles. Since
 both Kiwix and ZimReader are part of the openzim
 standardization effort, this confuses me a bit. Which
 format is supposed to be the standard? 
Tommi already answered to that... 

IMO this is the job of the indexer to find the title... if there is a HTML page with a
title, it has to use it.

More globaly, IMO forcing ZIM creators with url=title is a bad idea, we never should
forced (with the format) to adopt special way of representig/storing Informations. All
what if possible with "normal" HTTP/HTML should be also possible with contents
in a ZIM file.

Emmanuel 

Re: [openZIM dev-l] Kiwix and ZimReader/Writer Index Format