In principle, I understand the need for binary formats and compression in a context with limited resources.
On the other hand, plain text formats are easy to work with, especially for third-party users and organizations.
Playing the devil advocate, I could even argue that you should keep the data dumps in plain text, and keep your processing dead simple, and then let distributed processing systems such as Hadoop MapReduce (or Storm, Spark, etc.) handle the scale and compute
diffs whenever needed or on the fly.
Reading the Wiki mentioned at the beginning of this thread, it is not clear to me what the requirements are for this new incremental update format, and why?
Therefore, it is not easy to provide input and help.
- Nicolas Torzec.
PS: Anyway, thanks a lot for your great work on the data backends, behind the scene ;)