* there are some errors in links of files and special pages
examples
קובץ:Nuvola_apps_important.svg link to
ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים איטלקים (wikipedia:wikipedia projects\ articles without images\categories\Sports people from Italy)
מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
* size is important because we intend to add images
2009/7/6
<dev-l-request@openzim.org>
Send dev-l mailing list submissions to
dev-l@openzim.org
To subscribe or unsubscribe via the World Wide Web, visit
https://intern.openzim.org/mailman/listinfo/dev-l
or, via email, send a message with subject or body 'help' to
dev-l-request@openzim.org
You can reach the person managing the list at
dev-l-owner@openzim.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of dev-l digest..."
Today's Topics:
1. Kiwix index size (Asaf Bartov)
2. Re: Kiwix index size (Manuel Schneider)
3. Re: Kiwix index size (Emmanuel Engelhart)
----------------------------------------------------------------------
Message: 1
Date: Sun, 5 Jul 2009 19:18:57 +0300
From: Asaf Bartov <asaf.bartov@gmail.com>
Subject: [openZIM dev-l] Kiwix index size
To: dev-l@openzim.org
Message-ID:
<50a20d900907050918r3fcff23l275c67690ed7fc20@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi, everyone.
When running Kiwix's indexer on the ZIM file I had created from the Hebrew
Wikipedia last week, the Kiwix data directory ran up to a total of 31 items,
totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion make
sense?
Detailed ls output attached.
Thanks in advance,
Asaf Bartov
--
Asaf Bartov <asaf@forum2.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/attachment.html>
-------------- next part --------------
rotem@desktop:~/.www.kiwix.org/kiwix$ ls -l -h -a -R
.:
total 16K
drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .
drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default
-rw-r--r-- 1 rotem rotem 94 2009-07-01 16:10 profiles.ini
./7680jxd5.default:
total 1.7M
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .
drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 31c26198d06ad265677b450796cc09aa.index
-rw------- 1 rotem rotem 162 2009-07-05 18:19 compatibility.ini
-rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions
-rw-r--r-- 1 rotem rotem 169 2009-07-01 16:10 localstore.rdf
-rw-r--r-- 1 rotem rotem 304 2009-07-05 18:39 mimeTypes.rdf
-rw-r--r-- 1 rotem rotem 0 2009-07-05 18:40 .parentlock
-rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite
-rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite
-rw------- 1 rotem rotem 951 2009-07-05 19:00 prefs.js
-rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl
-rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:19 xpti.dat
-rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:20 XUL.mfasl
./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index:
total 2.4G
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 .
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
-rw-r--r-- 1 rotem rotem 0 2009-07-02 01:46 flintlock
-rw-r--r-- 1 rotem rotem 12 2009-07-02 01:46 iamflint
-rw-r--r-- 1 rotem rotem 22K 2009-07-02 05:13 position.baseA
-rw-r--r-- 1 rotem rotem 21K 2009-07-02 05:10 position.baseB
-rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB
-rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:13 postlist.baseA
-rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:10 postlist.baseB
-rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB
-rw-r--r-- 1 rotem rotem 70 2009-07-02 05:13 record.baseA
-rw-r--r-- 1 rotem rotem 70 2009-07-02 05:10 record.baseB
-rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB
-rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA
-rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB
-rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB
-rw-r--r-- 1 rotem rotem 232 2009-07-02 05:13 value.baseA
-rw-r--r-- 1 rotem rotem 230 2009-07-02 05:10 value.baseB
-rw-r--r-- 1 rotem rotem 14M 2009-07-02 05:13 value.DB
./7680jxd5.default/extensions:
total 8.0K
drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 .
drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
rotem@desktop:~/.www.kiwix.org/kiwix$
------------------------------
Message: 2
Date: Sun, 5 Jul 2009 20:57:39 +0200
From: Manuel Schneider <manuel.schneider@wikimedia.ch>
Subject: Re: [openZIM dev-l] Kiwix index size
To: asaf@forum2.org, dev-l@openzim.org
Message-ID: <200907052057.39966.manuel.schneider@wikimedia.ch>
Content-Type: text/plain; charset="utf-8"
Hi Asaf,
Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov:
> When running Kiwix's indexer on the ZIM file I had created from the Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of 31
> items, totalling 2.3 GB. The ZIM file itself is ~300MB. Does this
> proportion make sense?
I am not sure about the other files which were created, you only need the ZIM
file with the index itself.
For 900'000 articles the ZIM file containing the articles was 1.4 GB, the
Index ZIM was 1.0 GB.
So I think 300 MB looks fine.
Greets,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur F?rderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
------------------------------
Message: 3
Date: Sun, 05 Jul 2009 21:05:33 +0200
From: Emmanuel Engelhart <emmanuel@engelhart.org>
Subject: Re: [openZIM dev-l] Kiwix index size
To: asaf@forum2.org, dev-l@openzim.org
Message-ID: <4A50F97D.2030607@engelhart.org>
Content-Type: text/plain; charset=ISO-8859-1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Asaf
Asaf Bartov a ?crit :
> When running Kiwix's indexer on the ZIM file I had created from the Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of 31 items,
> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion make
> sense?
this is possible. Kiwix uses the Xapian search engine which generates
pretty big index files.
I have to questions:
* Are the search results OK?
* Do you have a problem with the size of the index? Do you have a size
limit?
They are many open search/index softwares. I choose to use Xapian for
many reasons, but this is possible under certain condition to add to
Kiwix the support to an another search engine. This should be also
possible to make a modified version of the indexer using less disk space
(but with less words indexed).
OpenZIM itself provides a search solution, Tommi can explain you more
about it. Maybe it would be interesting for you to test it and give us a
feedback!
Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I
JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD
=RH/U
-----END PGP SIGNATURE-----
------------------------------
_______________________________________________
dev-l mailing list
dev-l@openzim.org
https://intern.openzim.org/mailman/listinfo/dev-l
End of dev-l Digest, Vol 5, Issue 2
***********************************