I'm starting to look at some machine learning projects I've wanted to do for a while (ex: sock-puppet detection). This quickly leads to having to make decisions about data storage formats, i.e. csv, json, protobufs, etc. Left to my own devices, I'd probably use protos, but I don't want to be swimming upstream.
Are there any standards in wiki-land for how people store data? If there's some common way that "everybody does it", that's how I want to do it too. Or, does every project just do their own thing?