Gregory Maxwell wrote:
Simply striping the bad characters on the serialized output frightens me because I worry that eventually I'll strip an adjacent delimiter and make the output unparsable.
Due to the way UTF-8 is designed, it should in fact be safe to do this. For details, see http://en.wikipedia.org/wiki/UTF-8.