Re: [openZIM dev-l] How to handle 2GB and 4 GB file size limits?

List overview All Threads
Download

newer

older

[openZIM dev-l] LinuxTag - last...

[openZIM dev-l] Ben NanoNote and...

emmanuel＠engelhart.org

4 Jun 2010 4 Jun '10

12:31 p.m.

Le jeu 03/06/10 18:59, Tommi Mäkitalo tommi@tntnet.org a écrit:

...

The solution 1 has really the advantage, that the user can download a single zim file and split himself when needed. There is even a unix/linux-tool to split files into pieces named split (isn't it nice how intuitive unix really is ;-) ).

This seems to me to be the best solution to the >4GB limit of FAT32. But this does not assigned the problem with option pictures ZIM file etc... But this is maybe better to consider thus as two different topics.

...

It is quite easy to extend the iostream to support multiple files, so that it internally join the files into one zim file. We just have to think about the interface, how to tell zimlib which files to join.

As you suggested a naming convention is one possible solution. We may even use the schema from split. So if you split foo.zim into parts, the parts are named foo.zimaa, foo.zimab, foo.zimac and so on. If you tell zimlib to open file foo.zim and it is not found, it looks for the parts until it do not find any more.

I thing naming convention is not the best solution, I do not like the idea to have a zimlib depending on filenames. For thousands reasons the ZIM file names may change.

What about: *zim::file constructor accepting also directory filename and if directory loads all files inside and merge them (easy for the app. dev. but a little bit tricky) *zim::file constructor accepting a list of ZIM file chunks (less handful for the app. dev. but clean)

Additional question: ist NTFS not widely supported?

Emmanuel

Show replies by date

Tommi Mäkitalo

4 Jun 4 Jun

4:42 p.m.

New subject: [openZIM dev-l] How to handle 2GB and 4 GB file size limits?

Am Freitag, Juni 04, 2010, 12:31:40 schrieb emmanuel@engelhart.org:

...

Le jeu 03/06/10 18:59, Tommi Mäkitalo tommi@tntnet.org a écrit:

...
The solution 1 has really the advantage, that the user can download a

...

I thing naming convention is not the best solution, I do not like the idea to have a zimlib depending on filenames. For thousands reasons the ZIM file names may change.

What about: *zim::file constructor accepting also directory filename and if directory loads all files inside and merge them (easy for the app. dev. but a little bit tricky) *zim::file constructor accepting a list of ZIM file chunks (less handful for the app. dev. but clean)

Additional question: ist NTFS not widely supported?

Emmanuel

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

Hi,

this directory solution has the problem, that there is no standard way to access directories. A list of files is easier to handle. Zimlib may even accept a list of files in one string separated by colon like in the PATH variable. Then you could open a zimfile using e.g. "myfile.zim.part1:myfile.zim.part2:myfile.zim.part3". Zimlib may parse the list and use these file names. The advantage is, that the files do not need to be in the same directory. If the filename contains a colon itself (I know one operating system, which do not use mount points but partition names separated by colon like C:\somefile.zim), it may be escaped with a backslash, so passing the file name must be passed as "C:\somefile.zim".

I already implemented yesterday the feature, that zimfile handles multiple parts in a file. It works fine here. Currently I use the naming convention from split but this is easy to change.

And NTFS is widely supported but only on one operating system and not on every device. If you buy a usb stick, it is formatted with fat, which has this 2G or 4G (I don't really know) limitation.

Tommi

Christian Pühringer

9:31 p.m.

New subject: [openZIM dev-l] How to handle 2GB and 4 GB file size limits?

Hi,

...

I already implemented yesterday the feature, that zimfile handles multiple parts in a file. It works fine here. Currently I use the naming convention from split but this is easy to change.

That's good news. While for users of 2GB/4GB limited devices it would be easier if all zim files were limited to 2GB with internal references to the other files, I agree that this would impact user experience on devices without this limitation. Kind of dilemma, but your solution is fine for me, in particular considering that in the future most devices will support larger files. (e.g. SDXC cards won't have this limitation). However, we should strongly consider to define a standard naming scheme for parts. *part.zimN is fine for me. This ensures that is possible to put pre-split files for download, without being required to give instructions how to rename them for the different applications. Also if the end-user has to be split the files, it is easier if there is a standard way to do it. Note, that this may actually be only a definition, with is not implemented in the zimlib but by the applications. However, I'd prefer if it is supported in the zimlib (for example as an separate open method).

Furthermore, a check (e.g. on opening) whether the file set is complete would be helpful. I am not sure whether there is a feasible solution for this. However, if you have any idea how to do this, I think it would be a very useful feature.

Best regards, Christian

Am 04.06.2010 16:42, schrieb Tommi Mäkitalo:

...

Am Freitag, Juni 04, 2010, 12:31:40 schrieb emmanuel@engelhart.org:

...
Le jeu 03/06/10 18:59, Tommi Mäkitalo tommi@tntnet.org a écrit:

...
The solution 1 has really the advantage, that the user can download a

...

...
I thing naming convention is not the best solution, I do not like the idea to have a zimlib depending on filenames. For thousands reasons the ZIM file names may change.

What about: *zim::file constructor accepting also directory filename and if directory loads all files inside and merge them (easy for the app. dev. but a little bit tricky) *zim::file constructor accepting a list of ZIM file chunks (less handful for the app. dev. but clean)

Additional question: ist NTFS not widely supported?

Emmanuel

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

Hi,

this directory solution has the problem, that there is no standard way to access directories. A list of files is easier to handle. Zimlib may even accept a list of files in one string separated by colon like in the PATH variable. Then you could open a zimfile using e.g. "myfile.zim.part1:myfile.zim.part2:myfile.zim.part3". Zimlib may parse the list and use these file names. The advantage is, that the files do not need to be in the same directory. If the filename contains a colon itself (I know one operating system, which do not use mount points but partition names separated by colon like C:\somefile.zim), it may be escaped with a backslash, so passing the file name must be passed as "C:\somefile.zim".

I already implemented yesterday the feature, that zimfile handles multiple parts in a file. It works fine here. Currently I use the naming convention from split but this is easy to change.

And NTFS is widely supported but only on one operating system and not on every device. If you buy a usb stick, it is formatted with fat, which has this 2G or 4G (I don't really know) limitation.

Tommi _______________________________________________ dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

Tommi Mäkitalo

10:44 p.m.

New subject: [openZIM dev-l] How to handle 2GB and 4 GB file size limits?

Hi,

I see - there are different opinions, how to handle this. For me it is easy to offer different open methods for these. My current implementation is based on a list of files. How this list is received do not change the algorithms, used for reading the files.

Zimlib does a rudimentary check, if the pointer to the last data cluster is inside the file. So if there is one file missing, it is very probable, that this is detected. But I can add some more checks. Maybe I can decompress the last chunk to check, if this is consistent.

The implementation also avoids opening too many file handles. By default it opens only 5 files at the same time. If another file is needed, one file is closed. A special caching algorithm is used to check, which file to close.

I just checked in the current implementation, which uses the naming scheme from the split command (suffix aa, ab, ac ... zz).

Tommi

5315

Age (days ago)

5315

Last active (days ago)

offline-l@lists.wikimedia.org

3 comments

3 participants

tags (0)

participants (3)

Christian Pühringer
emmanuel＠engelhart.org
Tommi Mäkitalo