Hi all,
What encoding is used for article, metadata, categories data, ... , respectively for the title and url strings in the directory? I could not find documentation on this.
Simplest for handling in zim-viewers would be to define that everything is encoded in UTF-8. This should work with all languages. Other option would be to define the encoding either for the comple zim file (e.g. in metadata), or on-a-per article (html-tag in header). It would make sense to restrict the possible encodings to some small subset, as else reader are not compatible with all zim-files. In case a-per-article encoding is to be supported, it would be necessary to specify the encoding of the directory entires separately. Disadvantages of this approach is the higher complexity for the reader, in particular in the per-article approach. Furthermore the definition is more complex. (for example it needs to be defined what encoding is used if no encoding is specified in an article/metadata.)
I'd prefer to just define everything is UTF-8, but I am not sure whether this has drawbacks I am not aware of. However, I think it is very important that we define something about encoding, because else we cannot support zim files in all languages reliable.
Best regards, Christian
Hi Christian,
you are right, this has not been identified in the ZIM File Format article, even though we have discussed this at our very first Developers Meeting back in 2009:
http://openzim.org/Developer_Meetings/2009-1#Minutes
Everything should be in UTF-8.
I have added a section "Encodings" to the Zim File Format article right now to fix this in our documentation:
http://openzim.org/ZIM_File_Format#Encodings
Thanks for asking!
/Manuel
Am 26.06.2011 11:58, schrieb Christian Pühringer:
What encoding is used for article, metadata, categories data, ... , respectively for the title and url strings in the directory? I could not find documentation on this.