I quite enjoy thinking about creating data formats. Here's a hopefully quick brain dump for you:
A hot data format is one that only makes sense with the application running because it contains references to objects/entities/data that are not actually a part of the format, or it contains references that can only be looked up using a data structure that is not internal to the format. The most obvious version of this is if the format contains pointers, but this can also mean indexes into arrays that are external to the data block, hashed strings, etc. A cold data format is basically a "not hot" format. A cold data format can be loaded into any application, and stored at any address. They are never missing any information under any context.
Should the data be converted or dumped to file as is?
Should the data be stringized?
- If your data formats are "hot" you'll have to do some serialization or you'll have to make them all cold even at runtime... I have found from experience that I prefer to let data formats be hot at run time if that's how they naturally turn out, and just write a routine that converts hot to cold and cold to hot.
- Big data should pretty much never be text.
- If you're likely to have bugs that corrupt the data, text formats really help to speed up debugging.
- If your tools for producing the data are still maturing, text formats make it easier to get usable sample data.
- A complex format, i.e. lots of references between things, takes a lot more work to stringize and destringize. I have found that text is a big win when the data is small, has only a few "types", and does not have to contain lots of references. I have also found that when text does go wrong it goes wrong quickly.
Hope this helps you think it through!