[openZIM dev-l] How to handle 2GB and 4 GB file size limits? - Offline-l

3 Jun 2010


      Hi,
Currently the wikipedia dumps are stored in single zim file. Their size 
is already over the 2GB for the english wikipedia, and over 4GB for some 
versions with images included.
Many devices don´t support files of that size, typically their file size 
limit is 2GB or 4GB.
The 2GB limit is due to the use of signed 32-bit types in file access 
and unfortunately not that uncommon.
For example Symbian2 (and earlier versions), the iostream in windows 
(see also http://bugs.openzim.org//show_bug.cgi?id=19), old linux 
versions, and
possibly Android [1] don´t support files larger than 2GB.
Others OS (including Symbian3, or Maemo) do support it, but still in 
many cases there is a 4GB limit due to FAT32 file system, which is the 
standard files system for SD cards, and also for internal memory of most 
mobile phones.
Some of them, like Maemo or Android?, support use of other file systems 
which don´t have this limit, but this requires reformatting the memory 
card, and makes the card unreadable for many other devices. Therefore 
also for these cases another solution would make sense.
The question is how to support devices which have the 2GB or 4GB limit. 
The following options come to my mind:
1. Split files on file system level with special naming convention. 
(e.g. *.0.zim.*.1.zim etc..) The zim format is unchanged,
the zim library has to be extended to support this.
Advantage: Relative simple change to zimlib (only replace iostream 
implementation)
End user can split files relatively easily
Disadvantage: Not a really clean solution.
2. Split files in valid zim files with separate headers. Store in all 
zim files (e.g. in metatdata (relation?)) names of related files.
Advantage: Clean solution
Allows other features as well, e.g. separating images and text into 
separate files.
Disadvantage: Larger change to zim file format.
Possibly larger change to zimlib (or application if handled in metadata)
3. Split in valid zim files with separate headers. No changes to zim 
file format or zimlib, application using zimlib has to
load all related files (and find out which are related) appropriately.
Advantage: No change to zimlib
Disadvantage: Application has to handle this,
Difficult for end user to split file
Not convention how to detect related files (In worst case user has to 
open all separately)
=> Problematic if split file are to be provided.
4. Other ideas?
For all options it is possible to directly provide the split files (thus 
in future mediawiki would directly write out 2GB zim files)
, or to let the end user to do this.
I´d definitely prefer if split files are provided.
What is your opinion on this?
For the WikiOnBoard I´d have to solve this soon (in particular if the 
next german wiki zim is larger than 2GB ;)
and therefore I´d need to implement something on my own if there is no 
agreed solution.
However, I´d strongly prefer if there is an agreement for such a common 
solution.
Best regards,
Christian
[1] http://osdir.com/ml/android-porting/2010-03/msg00107.html