About the dead links, few thinks: * Are you sure the problem is not at the source (HTML files) * the zimwriter does not check if all links in all HTML pages are OK * it seems that the libzim returns a bad content if the content does not exist (Tommi can you confirm?). Should returns nothing or an error code IMO.
Emmanuel
Le lun 06/07/09 14:58, "Rotem Simha" hidroo@gmail.com a écrit:
- there are some errors in links of files and special pages
examples קובץ:Nuvola_apps_important.svg [1] link to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים איטלקים (wikipedia:wikipedia projects articles without imagescategoriesSports people from Italy) מיוחד:אקראי (Special:Random) > 15 במאי (may 15) מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט
- size is important because we intend to add images
2009/7/6 Send dev-l mailing list submissions to dev-l@openzim.org
To subscribe or unsubscribe via the World Wide Web, visit https://intern.openzim.org/mailman/listinfo/dev-l [2] or, via email, send a message with subject or body help to dev-l-request@openzim.org
You can reach the person managing the list at dev-l-owner@openzim.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of dev-l digest..."
Todays Topics:
1. Kiwix index size (Asaf Bartov) 2. Re: Kiwix index size (Manuel Schneider) 3. Re: Kiwix index size (Emmanuel Engelhart)
Message: 1 Date: Sun, 5 Jul 2009 19:18:57 +0300 From: Asaf Bartov Subject: [openZIM dev-l] Kiwix index size To: dev-l@openzim.org Message-ID: Content-Type: text/plain; charset="iso-8859-1"
Hi, everyone.
When running Kiwixs indexer on the ZIM file I had created from the Hebrew Wikipedia last week, the Kiwix data directory ran up to a total of 31 items, totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion make sense?
Detailed ls output attached.
Thanks in advance,
Asaf Bartov
Asaf Bartov
On Montag 06 Juli 2009 15:09:59 emmanuel@engelhart.org wrote:
About the dead links, few thinks:
- Are you sure the problem is not at the source (HTML files)
- the zimwriter does not check if all links in all HTML pages are OK
- it seems that the libzim returns a bad content if the content does not
exist (Tommi can you confirm?). Should returns nothing or an error code IMO.
Emmanuel
Hi,
yes the zimwriter just writes the zim file and does not look into the content.
The libzim should never return bad content. It throws exceptions, when something exceptional occures. If it returns really bad content, it is a bug, I have to fix.
Tommi