Clarification:

This last message was by Rotem, a fellow WM-IL member helping me with the embedding of the Hebrew Wikipedia in the One Computer Per Child project.

He is reporting issues with Kiwix and the ZIM file I created last week.

Regarding size:  Size is important, because we intend to add images (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no pictures).  We are hoping to have at least 5GB reserved for us in those One Computer Per Child machines we are to install on, but we may be forced to make do with 3GB.  So every MB saved from the index, is another MB available for images...

   Asaf Bartov
   Wikimedia Israel

On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha <hidroo@gmail.com> wrote:
* there are some errors in links of files and special pages
examples
קובץ:Nuvola_apps_important.svg link to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים איטלקים (wikipedia:wikipedia projects\ articles without images\categories\Sports people from Italy)
מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט

* size is important because we intend to add images

2009/7/6 <dev-l-request@openzim.org>
Send dev-l mailing list submissions to
       dev-l@openzim.org

To subscribe or unsubscribe via the World Wide Web, visit
       https://intern.openzim.org/mailman/listinfo/dev-l
or, via email, send a message with subject or body 'help' to
       dev-l-request@openzim.org

You can reach the person managing the list at
       dev-l-owner@openzim.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of dev-l digest..."


Today's Topics:

  1. Kiwix index size (Asaf Bartov)
  2. Re: Kiwix index size (Manuel Schneider)
  3. Re: Kiwix index size (Emmanuel Engelhart)


----------------------------------------------------------------------

Message: 1
Date: Sun, 5 Jul 2009 19:18:57 +0300
From: Asaf Bartov <asaf.bartov@gmail.com>
Subject: [openZIM dev-l] Kiwix index size
To: dev-l@openzim.org
Message-ID:
       <50a20d900907050918r3fcff23l275c67690ed7fc20@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi, everyone.

When running Kiwix's indexer on the ZIM file I had created from the Hebrew
Wikipedia last week, the Kiwix data directory ran up to a total of 31 items,
totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this proportion make
sense?

Detailed ls output attached.

Thanks in advance,

  Asaf Bartov
--
Asaf Bartov <asaf@forum2.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/attachment.html>
-------------- next part --------------
rotem@desktop:~/.www.kiwix.org/kiwix$ ls -l -h -a -R
.:
total 16K
drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .
drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default
-rw-r--r-- 1 rotem rotem   94 2009-07-01 16:10 profiles.ini

./7680jxd5.default:
total 1.7M
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .
drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 31c26198d06ad265677b450796cc09aa.index
-rw------- 1 rotem rotem  162 2009-07-05 18:19 compatibility.ini
-rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions
-rw-r--r-- 1 rotem rotem  169 2009-07-01 16:10 localstore.rdf
-rw-r--r-- 1 rotem rotem  304 2009-07-05 18:39 mimeTypes.rdf
-rw-r--r-- 1 rotem rotem    0 2009-07-05 18:40 .parentlock
-rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite
-rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite
-rw------- 1 rotem rotem  951 2009-07-05 19:00 prefs.js
-rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl
-rw-r--r-- 1 rotem rotem  98K 2009-07-05 18:19 xpti.dat
-rw-r--r-- 1 rotem rotem  98K 2009-07-05 18:20 XUL.mfasl

./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index:
total 2.4G
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 .
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
-rw-r--r-- 1 rotem rotem    0 2009-07-02 01:46 flintlock
-rw-r--r-- 1 rotem rotem   12 2009-07-02 01:46 iamflint
-rw-r--r-- 1 rotem rotem  22K 2009-07-02 05:13 position.baseA
-rw-r--r-- 1 rotem rotem  21K 2009-07-02 05:10 position.baseB
-rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB
-rw-r--r-- 1 rotem rotem  12K 2009-07-02 05:13 postlist.baseA
-rw-r--r-- 1 rotem rotem  12K 2009-07-02 05:10 postlist.baseB
-rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB
-rw-r--r-- 1 rotem rotem   70 2009-07-02 05:13 record.baseA
-rw-r--r-- 1 rotem rotem   70 2009-07-02 05:10 record.baseB
-rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB
-rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA
-rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB
-rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB
-rw-r--r-- 1 rotem rotem  232 2009-07-02 05:13 value.baseA
-rw-r--r-- 1 rotem rotem  230 2009-07-02 05:10 value.baseB
-rw-r--r-- 1 rotem rotem  14M 2009-07-02 05:13 value.DB

./7680jxd5.default/extensions:
total 8.0K
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 .
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
rotem@desktop:~/.www.kiwix.org/kiwix$

------------------------------

Message: 2
Date: Sun, 5 Jul 2009 20:57:39 +0200
From: Manuel Schneider <manuel.schneider@wikimedia.ch>
Subject: Re: [openZIM dev-l] Kiwix index size
To: asaf@forum2.org, dev-l@openzim.org
Message-ID: <200907052057.39966.manuel.schneider@wikimedia.ch>
Content-Type: text/plain;  charset="utf-8"

Hi Asaf,

Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov:
> When running Kiwix's indexer on the ZIM file I had created from the Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of 31
> items, totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this
> proportion make sense?

I am not sure about the other files which were created, you only need the ZIM
file with the index itself.

For 900'000 articles the ZIM file containing the articles was 1.4 GB, the
Index ZIM was 1.0 GB.

So I think 300 MB looks fine.

Greets,


Manuel
--
Regards
Manuel Schneider

Wikimedia CH - Verein zur F?rderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch


------------------------------

Message: 3
Date: Sun, 05 Jul 2009 21:05:33 +0200
From: Emmanuel Engelhart <emmanuel@engelhart.org>
Subject: Re: [openZIM dev-l] Kiwix index size
To: asaf@forum2.org, dev-l@openzim.org
Message-ID: <4A50F97D.2030607@engelhart.org>
Content-Type: text/plain; charset=ISO-8859-1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Asaf
Asaf Bartov a ?crit :
> When running Kiwix's indexer on the ZIM file I had created from the Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of 31 items,
> totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this proportion make
> sense?

this is possible. Kiwix uses the Xapian search engine which generates
pretty big index files.

I have to questions:
* Are the search results OK?
* Do you have a problem with the size of the index? Do you have a size
limit?

They are many open search/index softwares. I choose to use Xapian for
many reasons, but this is possible under certain condition to add to
Kiwix the support to an another search engine. This should be also
possible to make a modified version of the indexer using less disk space
(but with less words indexed).

OpenZIM itself provides a search solution, Tommi can explain you more
about it. Maybe it would be interesting for you to test it and give us a
 feedback!

Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I
JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD
=RH/U
-----END PGP SIGNATURE-----


------------------------------

_______________________________________________
dev-l mailing list
dev-l@openzim.org
https://intern.openzim.org/mailman/listinfo/dev-l


End of dev-l Digest, Vol 5, Issue 2
***********************************


--
Rotem Simha

_______________________________________________
dev-l mailing list
dev-l@openzim.org
https://intern.openzim.org/mailman/listinfo/dev-l




--
--
Asaf Bartov <asaf@forum2.org>