Hi, Emmanuel.
Thanks a lot for this information! We can live with a 0.5GB index, so now
Kiwix is a realistic option for us.
In the coming week, we will be submitting an initial localization file for
Kiwix, providing Hebrew strings for its user interface.
Asaf
On Wed, Jul 8, 2009 at 10:23 PM, Emmanuel Engelhart
<emmanuel(a)engelhart.org>wrote;wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Asaf,
I have improved the indexing code and now (with Kiwix SVN code) the
index is only 1.1G for your ZIM file. For you this will be available in
the next Kiwix release (~2 weeks).
There is also a special tool called "xapian-compact" able to reduce
about 50% the index size. I do not plan currently to integrate it to
Kiwix, but if you want to produce a software with already contents and
index, you can use it. I have tested and the index is now 573M.
So, it seems that it is all what I can do now with Xapian, but this is a
lot better than the first 2.3G :)
Another point is that the upcoming Xapian release will have a new
storage backend and this backend should again be able to reduce ~50% the
index. Actually I do not have test it, but that is what the developers say.
Regards
Emmanuel
Asaf Bartov a écrit :
Clarification:
This last message was by Rotem, a fellow WM-IL member helping me with the
embedding of the Hebrew Wikipedia in the One Computer Per Child project.
He is reporting issues with Kiwix and the ZIM file I created last week.
Regarding size: Size is important, because we intend to add images (the
300MB ZIM file is the complete Hebrew Wikipedia text, but no pictures).
We
are hoping to have at least 5GB reserved for us
in those One Computer Per
Child machines we are to install on, but we may be forced to make do with
3GB. So every MB saved from the index, is another MB available for
images...
Asaf Bartov
Wikimedia Israel
On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha <hidroo(a)gmail.com> wrote:
> * there are some errors in links of files and special pages
> examples
> קובץ:Nuvola_apps_important.svg<
http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg> link
> to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא
תמונות/קטגוריות/ספורטאים
איטלקים(wikipedia:wikipedia projects\ articles without
images\categories\Sports
> people from Italy)
> מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
> מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
>
> * size is important because we intend to add images
>
> 2009/7/6 <dev-l-request(a)openzim.org>
>
>> Send dev-l mailing list submissions to
>> dev-l(a)openzim.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>
https://intern.openzim.org/mailman/listinfo/dev-l
>> or, via email, send a message with subject or body 'help' to
>> dev-l-request(a)openzim.org
>>
>> You can reach the person managing the list at
>> dev-l-owner(a)openzim.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of dev-l digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Kiwix index size (Asaf Bartov)
>> 2. Re: Kiwix index size (Manuel Schneider)
>> 3. Re: Kiwix index size (Emmanuel Engelhart)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 5 Jul 2009 19:18:57 +0300
>> From: Asaf Bartov <asaf.bartov(a)gmail.com>
>> Subject: [openZIM dev-l] Kiwix index size
>> To: dev-l(a)openzim.org
>> Message-ID:
>> <50a20d900907050918r3fcff23l275c67690ed7fc20(a)mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi, everyone.
>>
>> When running Kiwix's indexer on the ZIM file I had created from the
Hebrew
>>> Wikipedia last week, the Kiwix data directory ran up to a total of 31
>>> items,
>>> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion
>>> make
>>> sense?
>>>
>>> Detailed ls output attached.
>>>
>>> Thanks in advance,
>>>
>>> Asaf Bartov
>>> --
>>> Asaf Bartov <asaf(a)forum2.org>
>>>