For storing updateable indexes, Berkeley DB 4-5, GDBM, and higher-level options like SQLite are widely used. LevelDB is pretty cool too.

I think that with the amount of data we're dealing with, it makes sense to have the file format under tight control. For example, saving a single byte on each revision means total savings of ~500 MB for enwiki.

In any case, at this point it would be more work to switch to one of those than to keep using the format I created.
For delta coding, there's xdelta3, open-vcdiff, and Git's delta code. (rzip/rsync are wicked awesome, but not as easy to just drop in as a library.) 

I'm certainly going to try to use some library for delta compression, because they seem to do pretty much exactly what's needed here. Thanks for the suggestions.

Petr Onderka