-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
a few weeks ago, Frederico reported me a small bug in a special HTML
page in a test ZIM file in Italian:
https://sourceforge.net/tracker/?func=detail&aid=2798771&group_id=175508&at…
The symptom is that references (<references/>) are not always rendered.
After auditing a little bit the Mediawiki code, it seems that this is a
Mediawiki bug. I have reported it there:
https://bugzilla.wikimedia.org/show_bug.cgi?id=19807
I can not fix it, this is too complicated for me... but maybe Mediawiki
hackers listening there have ideas ;)
Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkphxUEACgkQn3IpJRpNWtPX+ACgtP1QDNuRlZxz8FV2Dh/YChHK
FI8An0clv19tJo8leLNQkGSBd2vAIf2V
=t4UO
-----END PGP SIGNATURE-----
Hi,
I have this week end prepared a ZIM file with the last Wikipedia for Schools selection (from SOS children's Villages UK).
http://tmp.kiwix.org/zim/schools-wikipedia-full-20081023-rc1.zim
Content should be exactly the same as before but the size is 30% smaller.
After a few minutes indexing with Kiwix the whole content are available with the search engine.
At this occasion, I have implemented the ignore feature for HTML content with the NOINDEX meta tag. This is important to avoid indexing index pages, it would be maybe good to do also that with the zimindexer (if not already done).
I have maybe someone who is ready to pay a little bit to help to port Kiwix and libzim under Windows... nothing sure but I hope it will help to find someone to do this job quickly as we ;)
I have also made a feature request to add a few fields in the header, please have a look at it:
http://bugs.openzim.org/show_bug.cgi?id=4
Regards
Emmanuel
Clarification:
This last message was by Rotem, a fellow WM-IL member helping me with the
embedding of the Hebrew Wikipedia in the One Computer Per Child project.
He is reporting issues with Kiwix and the ZIM file I created last week.
Regarding size: Size is important, because we intend to add images (the
300MB ZIM file is the complete Hebrew Wikipedia text, but no pictures). We
are hoping to have at least 5GB reserved for us in those One Computer Per
Child machines we are to install on, but we may be forced to make do with
3GB. So every MB saved from the index, is another MB available for
images...
Asaf Bartov
Wikimedia Israel
On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha <hidroo(a)gmail.com> wrote:
> * there are some errors in links of files and special pages
> examples
> קובץ:Nuvola_apps_important.svg<http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg> link
> to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים איטלקים(wikipedia:wikipedia projects\ articles without images\categories\Sports
> people from Italy)
> מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
> מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
>
> * size is important because we intend to add images
>
> 2009/7/6 <dev-l-request(a)openzim.org>
>
>> Send dev-l mailing list submissions to
>> dev-l(a)openzim.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://intern.openzim.org/mailman/listinfo/dev-l
>> or, via email, send a message with subject or body 'help' to
>> dev-l-request(a)openzim.org
>>
>> You can reach the person managing the list at
>> dev-l-owner(a)openzim.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of dev-l digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Kiwix index size (Asaf Bartov)
>> 2. Re: Kiwix index size (Manuel Schneider)
>> 3. Re: Kiwix index size (Emmanuel Engelhart)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 5 Jul 2009 19:18:57 +0300
>> From: Asaf Bartov <asaf.bartov(a)gmail.com>
>> Subject: [openZIM dev-l] Kiwix index size
>> To: dev-l(a)openzim.org
>> Message-ID:
>> <50a20d900907050918r3fcff23l275c67690ed7fc20(a)mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi, everyone.
>>
>> When running Kiwix's indexer on the ZIM file I had created from the Hebrew
>> Wikipedia last week, the Kiwix data directory ran up to a total of 31
>> items,
>> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion
>> make
>> sense?
>>
>> Detailed ls output attached.
>>
>> Thanks in advance,
>>
>> Asaf Bartov
>> --
>> Asaf Bartov <asaf(a)forum2.org>
>>
Hello, Pascal.
Thanks for the pointer about OkaWix!
I've downloaded it and tried it with the Hebrew Wikipedia. Initial
findings:
- excellent rendering, including directionality.
- categories don't seem to work
I'm downloading the pictures now...
I note it's using Zeno files. Is it planned to use ZIM files in the future?
A.
On Mon, Jul 6, 2009 at 7:15 PM, Pascal Martin <pmartin(a)linterweb.fr> wrote:
> So do you try okawix with the hebrew contents ?
>
>
About the dead links, few thinks:
* Are you sure the problem is not at the source (HTML files)
* the zimwriter does not check if all links in all HTML pages are OK
* it seems that the libzim returns a bad content if the content does not exist (Tommi can you confirm?). Should returns nothing or an error code IMO.
Emmanuel
Le lun 06/07/09 14:58, "Rotem Simha" hidroo(a)gmail.com a écrit:
> * there are some errors in links of files and special pages
> examples
> קובץ:Nuvola_apps_important.svg [1] link to
> ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים
> ללא תמונות/קטגוריות/ספורטאים איטלקים
> (wikipedia:wikipedia projects articles without imagescategoriesSports
> people from Italy)
> מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
> מיוחד:שינויים אחרונים (Special:RecentChanges) >
> 10_באוגוסט
>
> * size is important because we intend to add images
>
> 2009/7/6
> Send dev-l mailing list submissions to
> dev-l(a)openzim.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://intern.openzim.org/mailman/listinfo/dev-l [2]
> or, via email, send a message with subject or body help to
> dev-l-request(a)openzim.org
>
> You can reach the person managing the list at
> dev-l-owner(a)openzim.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dev-l digest..."
>
> Todays Topics:
>
> 1. Kiwix index size (Asaf Bartov)
> 2. Re: Kiwix index size (Manuel Schneider)
> 3. Re: Kiwix index size (Emmanuel Engelhart)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 5 Jul 2009 19:18:57 +0300
> From: Asaf Bartov
> Subject: [openZIM dev-l] Kiwix index size
> To: dev-l(a)openzim.org
> Message-ID:
>
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi, everyone.
>
> When running Kiwixs indexer on the ZIM file I had created from the
> Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of
> 31 items,
> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this
> proportion make
> sense?
>
> Detailed ls output attached.
>
> Thanks in advance,
>
> Asaf Bartov
> --
> Asaf Bartov
>
I have answered to Rotem about the links. I have also open a bug on the Kiwix side:
https://sourceforge.net/tracker/?func=detail&aid=2817440&group_id=175508&at…
For the search engine index size, we have to search a solution with a smaller index.
Starting with the openzim solution should be good.
I will have a look during this week.
Emmanuel
Le lun 06/07/09 15:03, "Asaf Bartov" asaf.bartov(a)gmail.com a écrit:
> Clarification:
>
> This last message was by Rotem, a fellow WM-IL member helping me with
> the embedding of the Hebrew Wikipedia in the One Computer Per Child
> project.
>
> He is reporting issues with Kiwix and the ZIM file I created last
> week.
>
> Regarding size: Size is important, because we intend to add images
> (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no
> pictures). We are hoping to have at least 5GB reserved for us in
> those One Computer Per Child machines we are to install on, but we may
> be forced to make do with 3GB. So every MB saved from the index, is
> another MB available for images...
>
> Asaf Bartov
> Wikimedia Israel
>
> On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha wrote:
> * there are some errors in links of files and special pages
> examples
> קובץ:Nuvola_apps_important.svg [1] link to
> ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים
> ללא תמונות/קטגוריות/ספורטאים איטלקים
> (wikipedia:wikipedia projects articles without imagescategoriesSports
> people from Italy)
> מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
> מיוחד:שינויים אחרונים (Special:RecentChanges) >
> 10_באוגוסט
>
> * size is important because we intend to add images
>
> 2009/7/6
> Send dev-l mailing list submissions to
> dev-l(a)openzim.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://intern.openzim.org/mailman/listinfo/dev-l [2]
> or, via email, send a message with subject or body help to
> dev-l-request(a)openzim.org
>
> You can reach the person managing the list at
> dev-l-owner(a)openzim.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dev-l digest..."
>
> Todays Topics:
>
> 1. Kiwix index size (Asaf Bartov)
> 2. Re: Kiwix index size (Manuel Schneider)
> 3. Re: Kiwix index size (Emmanuel Engelhart)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 5 Jul 2009 19:18:57 +0300
> From: Asaf Bartov
> Subject: [openZIM dev-l] Kiwix index size
> To: dev-l(a)openzim.org
> Message-ID:
>
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi, everyone.
>
> When running Kiwixs indexer on the ZIM file I had created from the
> Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of
> 31 items,
> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this
> proportion make
> sense?
>
> Detailed ls output attached.
>
> Thanks in advance,
>
> Asaf Bartov
> --
> Asaf Bartov
>
* there are some errors in links of files and special pages
examples
קובץ:Nuvola_apps_important.svg<http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg>
link
to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים
איטלקים(wikipedia:wikipedia projects\ articles without
images\categories\Sports
people from Italy)
מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
* size is important because we intend to add images
2009/7/6 <dev-l-request(a)openzim.org>
> Send dev-l mailing list submissions to
> dev-l(a)openzim.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://intern.openzim.org/mailman/listinfo/dev-l
> or, via email, send a message with subject or body 'help' to
> dev-l-request(a)openzim.org
>
> You can reach the person managing the list at
> dev-l-owner(a)openzim.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dev-l digest..."
>
>
> Today's Topics:
>
> 1. Kiwix index size (Asaf Bartov)
> 2. Re: Kiwix index size (Manuel Schneider)
> 3. Re: Kiwix index size (Emmanuel Engelhart)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 5 Jul 2009 19:18:57 +0300
> From: Asaf Bartov <asaf.bartov(a)gmail.com>
> Subject: [openZIM dev-l] Kiwix index size
> To: dev-l(a)openzim.org
> Message-ID:
> <50a20d900907050918r3fcff23l275c67690ed7fc20(a)mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi, everyone.
>
> When running Kiwix's indexer on the ZIM file I had created from the Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of 31
> items,
> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion
> make
> sense?
>
> Detailed ls output attached.
>
> Thanks in advance,
>
> Asaf Bartov
> --
> Asaf Bartov <asaf(a)forum2.org>
>
Hi, everyone.
When running Kiwix's indexer on the ZIM file I had created from the Hebrew
Wikipedia last week, the Kiwix data directory ran up to a total of 31 items,
totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion make
sense?
Detailed ls output attached.
Thanks in advance,
Asaf Bartov
--
Asaf Bartov <asaf(a)forum2.org>
Hi Asaf,
this is great! Do you can provide an download link, I'm curious about the result?
I do not know if you have include pictures in the dump, but if yes you can reduce the ZIM file size by 10-20% by using the ./optimizeContents.pl script and the --mediawikiOptim paramter.
If you achieve to do what you said, this would be wonderful, and certainly the first big deployement of the ZIM format.
About the Kiwix UI translation, this would be also great and pretty easy to do. I'm especialy interesting in it, because Hebrew would be the first translation of Kiwix with a language written from right to left. If you have time you can simply come to #kiwix on freenode and I will explain you how to do and give you the SVN access to upload translated files.
Congratulations
Emmanuel
Le mer 01/07/09 15:47, Asaf Bartov asaf.bartov(a)gmail.com a écrit:
> Hi, everyone.
>
> Im pleased to announce that, with many thanks to Emmanuel and Tommi,
> Ive finally managed to create a ZIM file based on a Hebrew Wikipedia
> dump. We are now finally able to begin evaluating it as a solution
> to our offline project of embedding a Hebrew Wikipedia on every
> computer distributed with the One Computer Per Child initiative of the
> Israeli government.
>
> We shall be concentrating a list of bugs and/or issues and bring them
> to this list. Once the main developers have a look at it, well
> gladly help with fixing it, too. Also, we shall be contributing a
> Hebrew interface to Kiwix.
>
> More to follow in the coming week or two.
>
> Thanks again to everyone who helped!
>
> Asaf Bartov
> Wikimedia Israel
> --
> Asaf Bartov
>
>
>
Hi, everyone.
I'm pleased to announce that, with many thanks to Emmanuel and Tommi, I've
finally managed to create a ZIM file based on a Hebrew Wikipedia dump. We
are now finally able to begin evaluating it as a solution to our offline
project of embedding a Hebrew Wikipedia on every computer distributed with
the One Computer Per Child initiative of the Israeli government.
We shall be concentrating a list of bugs and/or issues and bring them to
this list. Once the main developers have a look at it, we'll gladly help
with fixing it, too. Also, we shall be contributing a Hebrew interface to
Kiwix.
More to follow in the coming week or two.
Thanks again to everyone who helped!
Asaf Bartov
Wikimedia Israel
--
Asaf Bartov <asaf(a)forum2.org>