Hi,
Currently the wikipedia dumps are stored in single zim file. Their size
is already over the 2GB for the english wikipedia, and over 4GB for some
versions with images included.
Many devices don´t support files of that size, typically their file size
limit is 2GB or 4GB.
The 2GB limit is due to the use of signed 32-bit types in file access
and unfortunately not that uncommon.
For example Symbian2 (and earlier versions), the iostream in windows
(see also http://bugs.openzim.org//show_bug.cgi?id=19), old linux
versions, and
possibly Android [1] don´t support files larger than 2GB.
Others OS (including Symbian3, or Maemo) do support it, but still in
many cases there is a 4GB limit due to FAT32 file system, which is the
standard files system for SD cards, and also for internal memory of most
mobile phones.
Some of them, like Maemo or Android?, support use of other file systems
which don´t have this limit, but this requires reformatting the memory
card, and makes the card unreadable for many other devices. Therefore
also for these cases another solution would make sense.
The question is how to support devices which have the 2GB or 4GB limit.
The following options come to my mind:
1. Split files on file system level with special naming convention.
(e.g. *.0.zim.*.1.zim etc..) The zim format is unchanged,
the zim library has to be extended to support this.
Advantage: Relative simple change to zimlib (only replace iostream
implementation)
End user can split files relatively easily
Disadvantage: Not a really clean solution.
2. Split files in valid zim files with separate headers. Store in all
zim files (e.g. in metatdata (relation?)) names of related files.
Advantage: Clean solution
Allows other features as well, e.g. separating images and text into
separate files.
Disadvantage: Larger change to zim file format.
Possibly larger change to zimlib (or application if handled in metadata)
3. Split in valid zim files with separate headers. No changes to zim
file format or zimlib, application using zimlib has to
load all related files (and find out which are related) appropriately.
Advantage: No change to zimlib
Disadvantage: Application has to handle this,
Difficult for end user to split file
Not convention how to detect related files (In worst case user has to
open all separately)
=> Problematic if split file are to be provided.
4. Other ideas?
For all options it is possible to directly provide the split files (thus
in future mediawiki would directly write out 2GB zim files)
, or to let the end user to do this.
I´d definitely prefer if split files are provided.
What is your opinion on this?
For the WikiOnBoard I´d have to solve this soon (in particular if the
next german wiki zim is larger than 2GB ;)
and therefore I´d need to implement something on my own if there is no
agreed solution.
However, I´d strongly prefer if there is an agreement for such a common
solution.
Best regards,
Christian
[1] http://osdir.com/ml/android-porting/2010-03/msg00107.html
Hi
I have released yesterday a RC1 of a DVD with 4 WMF project content (including Wikipedia) using Kiwix.
http://tmp.kiwix.org/iso/wmf_2010_rc1.iso
This DVD provides:
* recent WP, WS WB and WQ content Farsi with Pictures
* Kiwix 0.9 alpha3 for Windows with the pre-indexed content, running live on the DVD
* Kiwix sources and DEB package
* Localized Windows Installer
* Localized DVD launcher
All the stuff is free software and documentation is available to re-build the whole or part of the DVD.
DVDs and CDs are "outdated" techs in the western world but are still one of the best solutions to spread
content in Iran. With this experience, Kiwix and Moulinwiki try to find solutions to be always faster
in release such type of DVD. Other DVDs, especially in French and Burmese are in preparation.
Cheers
Emmanuel