For storing updateable indexes, Berkeley DB 4-5, GDBM, and higher-level
options like SQLite are widely used. LevelDB<https://code.google.com/p/leveldb/>
pretty cool too.
I think that with the amount of data we're dealing with, it makes sense to
have the file format under tight control. For example, saving a single byte
on each revision means total savings of ~500 MB for enwiki.
In any case, at this point it would be more work to switch to one of those
than to keep using the format I created.
For delta coding, there's xdelta3
(rzip <http://rzip.samba.org/>/rsync are wicked awesome, but not as easy
to just drop in as a library.)
I'm certainly going to try to use some library for delta compression,
because they seem to do pretty much exactly what's needed here. Thanks for