On 07/01/2013 12:48:11 PM, Petr Onderka - gsvick(a)gmail.com wrote:
What is the intended format of the dump files? The page makes it sound like
it will have a binary format, which I'm not opposed to, but is definitely
something you should decide on.
Yes, it is a binary format, I will make that clearer on the page.
The advantage of a binary format is that it's smaller, which I think is
quite important.
In my experience binary formats have very little to recommend them.
They are definitely more obscure. They sometimes suffer from endian problems.
They require special code to read and write.
In my experience I have found that the notion that they offer an advantage by
being "smaller" is somewhat misguided.
In particular, with XML, there is generally a very high degree of redundancy
in the text, far more than in normal writing.
The consequence of this regularity is that text based XML often compresses
very, very well.
I remember one particular instance where we were generating 30-50 Megabytes of
XML a day and needed to send it from the USA to the UK every day, in a
situation where our leased data rate was really limiting. We were surprised
and pleased to discover that zipping the files reduced them to only 1-2 MB. I
have been skeptical of claims that binary formats are more efficient on the
wire (where it matters most) ever since.
I think you should do some experiments versus compressed XML to justify your
claimed benefits of using a binary format.
Jim
<snip>
--
Jim Laurino
wican.x.jimlaur(a)dfgh.net
Please direct any reply to the list.
Only mail from the listserver reaches this address.