-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Asaf,
I have improved the indexing code and now (with Kiwix SVN code) the
index is only 1.1G for your ZIM file. For you this will be available in
the next Kiwix release (~2 weeks).
There is also a special tool called "xapian-compact" able to reduce
about 50% the index size. I do not plan currently to integrate it to
Kiwix, but if you want to produce a software with already contents and
index, you can use it. I have tested and the index is now 573M.
So, it seems that it is all what I can do now with Xapian, but this is a
lot better than the first 2.3G :)
Another point is that the upcoming Xapian release will have a new
storage backend and this backend should again be able to reduce ~50% the
index. Actually I do not have test it, but that is what the developers say.
Regards
Emmanuel
Asaf Bartov a écrit :
> Clarification:
>
> This last message was by Rotem, a fellow WM-IL member helping me with the
> embedding of the Hebrew Wikipedia in the One Computer Per Child project.
>
> He is reporting issues with Kiwix and the ZIM file I created last week.
>
> Regarding size: Size is important, because we intend to add images (the
> 300MB ZIM file is the complete Hebrew Wikipedia text, but no pictures). We
> are hoping to have at least 5GB reserved for us in those One Computer Per
> Child machines we are to install on, but we may be forced to make do with
> 3GB. So every MB saved from the index, is another MB available for
> images...
>
> Asaf Bartov
> Wikimedia Israel
>
> On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha <hidroo(a)gmail.com> wrote:
>
>> * there are some errors in links of files and special pages
>> examples
>>
קובץ:Nuvola_apps_important.svg<http://commons.wikimedia.org/wiki/File:Nu…
link
>> to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים
איטלקים(wikipedia:wikipedia projects\ articles without images\categories\Sports
>> people from Italy)
>> מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
>> מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
>>
>> * size is important because we intend to add images
>>
>> 2009/7/6 <dev-l-request(a)openzim.org>
>>
>>> Send dev-l mailing list submissions to
>>> dev-l(a)openzim.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>
https://intern.openzim.org/mailman/listinfo/dev-l
>>> or, via email, send a message with subject or body 'help' to
>>> dev-l-request(a)openzim.org
>>>
>>> You can reach the person managing the list at
>>> dev-l-owner(a)openzim.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of dev-l digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>> 1. Kiwix index size (Asaf Bartov)
>>> 2. Re: Kiwix index size (Manuel Schneider)
>>> 3. Re: Kiwix index size (Emmanuel Engelhart)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Sun, 5 Jul 2009 19:18:57 +0300
>>> From: Asaf Bartov <asaf.bartov(a)gmail.com>
>>> Subject: [openZIM dev-l] Kiwix index size
>>> To: dev-l(a)openzim.org
>>> Message-ID:
>>> <50a20d900907050918r3fcff23l275c67690ed7fc20(a)mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> Hi, everyone.
>>>
>>> When running Kiwix's indexer on the ZIM file I had created from the
Hebrew
>>> Wikipedia last week, the Kiwix data directory ran up to a total of 31
>>> items,
>>> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion
>>> make
>>> sense?
>>>
>>> Detailed ls output attached.
>>>
>>> Thanks in advance,
>>>
>>> Asaf Bartov
>>> --
>>> Asaf Bartov <asaf(a)forum2.org>
>>>