Hi, Emmanuel.
Thanks a lot for this information! We can live with a 0.5GB index, so now Kiwix is a realistic option for us.
In the coming week, we will be submitting an initial localization file for Kiwix, providing Hebrew strings for its user interface.
Asaf
On Wed, Jul 8, 2009 at 10:23 PM, Emmanuel Engelhart emmanuel@engelhart.orgwrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Asaf,
I have improved the indexing code and now (with Kiwix SVN code) the index is only 1.1G for your ZIM file. For you this will be available in the next Kiwix release (~2 weeks).
There is also a special tool called "xapian-compact" able to reduce about 50% the index size. I do not plan currently to integrate it to Kiwix, but if you want to produce a software with already contents and index, you can use it. I have tested and the index is now 573M.
So, it seems that it is all what I can do now with Xapian, but this is a lot better than the first 2.3G :)
Another point is that the upcoming Xapian release will have a new storage backend and this backend should again be able to reduce ~50% the index. Actually I do not have test it, but that is what the developers say.
Regards Emmanuel
Asaf Bartov a écrit :
Clarification:
This last message was by Rotem, a fellow WM-IL member helping me with the embedding of the Hebrew Wikipedia in the One Computer Per Child project.
He is reporting issues with Kiwix and the ZIM file I created last week.
Regarding size: Size is important, because we intend to add images (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no pictures).
We
are hoping to have at least 5GB reserved for us in those One Computer Per Child machines we are to install on, but we may be forced to make do with 3GB. So every MB saved from the index, is another MB available for images...
Asaf Bartov Wikimedia Israel
On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha hidroo@gmail.com wrote:
- there are some errors in links of files and special pages
examples קובץ:Nuvola_apps_important.svg<
http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg%3E link
to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים
איטלקים(wikipedia:wikipedia projects\ articles without images\categories\Sports
people from Italy) מיוחד:אקראי (Special:Random) > 15 במאי (may 15) מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
- size is important because we intend to add images
2009/7/6 dev-l-request@openzim.org
Send dev-l mailing list submissions to dev-l@openzim.org
To subscribe or unsubscribe via the World Wide Web, visit https://intern.openzim.org/mailman/listinfo/dev-l or, via email, send a message with subject or body 'help' to dev-l-request@openzim.org
You can reach the person managing the list at dev-l-owner@openzim.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of dev-l digest..."
Today's Topics:
- Kiwix index size (Asaf Bartov)
- Re: Kiwix index size (Manuel Schneider)
- Re: Kiwix index size (Emmanuel Engelhart)
Message: 1 Date: Sun, 5 Jul 2009 19:18:57 +0300 From: Asaf Bartov asaf.bartov@gmail.com Subject: [openZIM dev-l] Kiwix index size To: dev-l@openzim.org Message-ID: 50a20d900907050918r3fcff23l275c67690ed7fc20@mail.gmail.com Content-Type: text/plain; charset="iso-8859-1"
Hi, everyone.
When running Kiwix's indexer on the ZIM file I had created from the
Hebrew
Wikipedia last week, the Kiwix data directory ran up to a total of 31 items, totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion make sense?
Detailed ls output attached.
Thanks in advance,
Asaf Bartov
Asaf Bartov asaf@forum2.org