Progress update and thoughts on cross cutting concerns

Martin Fouilleul  —  3 weeks, 1 day ago [Edited 0 minutes later]
Hello !

This month I've been continuing to rewrite the cuelist code using POD structs, contiguous arrays and custom memory allocators. (Along the way I also started adding support for selecting/moving/deleting multiple cues in the timeline.)

It's been kind of a complete rewrite, because the gui code and the playback transport logic depended heavily on the structure of the cuelist. Which is another lesson learnt the hard way, that OOP doesn't really help with separation of concern. In fact in my case it merely lead me to try to hide cross-cutting concerns that I should have aknowledged and accepted as perfectly fine. In the more C-style approach I'm trying now, these pieces of code are still coupled, but the coupling is explicit and localized, instead of being split and hidden inside different classes.

For examples the playback transport logic was split between virtual functions of each cue type : each cue could receive and handle transport commands in what looked like an isolated and independant way. But in fact, all cue types needs to know about their surrounding : a group cue obviously needs to interact with its "children" cues to synchronize them to its timeline, a mix cue needs to access the matrices of other cues, and a control cues is associated with its target. Even an audio cue must signal that it reached end of file to its parent group, and to the mixing cues targeting its matrix.

Spliting these behaviours in different virtual functions makes it hard to understand all possible interactions, without reducing the (inherent) coupling. And when working on the transport logic, the code we need to pay attention to is scattered all over the place, which makes it easy to loose a lot of context (not only in term of "mental context" needed to modify or debug, but also actual context about the data, that could be taken advantage of in the code). Now there is one "transport.cpp" file with all functions related to the playback transport, so aside from any other benefit, the interactions between different types of cues is made very explicit and all happens at the same location, where the context is readily available.

One last piece of code that has been broken by the changes and that I didn't have time to rewrite is the save/restore feature. I might take the opportunity to shift from a text-based representation to something more related to the actual runtime data (which is made tempting by the new memory layout. To some extent I could just copy the entire memory arena to the file...). Maybe that's a terrible idea and I should stick to something more abstract, or even to some xml-like textual representation ? Feel free to comment on your experience about save file formats !

Thanks for reading !

Martin

#14761 Allen Webster  —  3 weeks, 1 day ago [Edited 8 minutes later]
I quite enjoy thinking about creating data formats. Here's a hopefully quick brain dump for you:

A hot data format is one that only makes sense with the application running because it contains references to objects/entities/data that are not actually a part of the format, or it contains references that can only be looked up using a data structure that is not internal to the format. The most obvious version of this is if the format contains pointers, but this can also mean indexes into arrays that are external to the data block, hashed strings, etc. A cold data format is basically a "not hot" format. A cold data format can be loaded into any application, and stored at any address. They are never missing any information under any context.

Should the data be converted or dumped to file as is?
  • If your data formats are "hot" you'll have to do some serialization or you'll have to make them all cold even at runtime... I have found from experience that I prefer to let data formats be hot at run time if that's how they naturally turn out, and just write a routine that converts hot to cold and cold to hot.
Should the data be stringized?
  • Big data should pretty much never be text.
  • If you're likely to have bugs that corrupt the data, text formats really help to speed up debugging.
  • If your tools for producing the data are still maturing, text formats make it easier to get usable sample data.
  • A complex format, i.e. lots of references between things, takes a lot more work to stringize and destringize. I have found that text is a big win when the data is small, has only a few "types", and does not have to contain lots of references. I have also found that when text does go wrong it goes wrong quickly.

Hope this helps you think it through!
#14762 Martin Fouilleul  —  3 weeks ago
Hi Allen !

Thanks for your feedback ! Your thoughts on hot/cold data format makes a lot of sense. My first attempt was to go with a stringized format to be able to quickly read and manually edit the save files, for debugging purpose. The run-time data itself is actually hot (and it was a lot more so when I was using classes and new/delete all over the place).

Now, it is organized in memory arenas and the memory layout is directly managed by the application. There are still a lot of pointers, but they could be converted as indices inside the memory block. So it is tempting to copy the entire arena and "patch" pointers (converting them to/from indices).

On the other hand, as the primary structure to describe in my use case is a tree, the references all go "one way", from parent to children. That kind of references can be made implicit by the layout of the data (eg. for a stringized format, storing children surrounded by brackets just after their parent's data). I find it easy to serialize (juste make a tree traversal) and it's also intuitive to read in a textual form, although a bit more involved to check/parse. But as you point out, a stringized version might seem like a win now because I only tested relatively small/simple sample data, and it could become more cumbersome as I add more features and cue types.

When you say that text goes wrong quickly, are you referring to data "corruption", or as the format itself becoming hard to maintain/reason about ? And do you think it's always a bad thing (one could argue that it enables you to spot bugs/bad decisions early) ?
#14770 Allen Webster  —  2 weeks, 6 days ago
By "goes wrong quickly" I mean the difficulty of maintaining the text format does not just scale linearly with the complexity of the data. If your format has about six or fewer "types" of "things" in it right now, and the references are just a directed acyclical graph or even better a tree, then the stringizer and parser won't be too bad. The "goes wrong quickly" happens when you want a new "type" that really doesn't fit into the way the stringizer and parser were already setup.

In general, conversions between "hot" <-> "text" are way more work to implement than conversions between "hot" <-> "cold". The win is all in the manual authoring and debugging. If you're unsure about the implications of doing the "hot" <-> "text", perhaps you could explore how you can reclaim the manual authoring and debugging in other ways. For instance, if you are storing a "cold" file format, you can just make a utility separate from you application that stringizes those files "cold" -> "text". This has the benefit that you can afford to be less performance focused and so you can instead optimize the code for easy maintenance. Parsing, ("text" -> X) is usually the most implementation work, so if manual authoring isn't important to you, you could just cut that and save a lot of work. If you do want manual authoring, do you have other ways to achieve it? Even if you end up writing a parser, I am finding that a "text" -> "cold" parser that does not have to be used at run time is much easier to maintain than just trying to do "text" -> "hot" right in the application.

I was hoping to help you think this through, but at this point I think I am just listing every imaginable permutation of a data pipeline, so I'll cut myself off there :D
Log in to comment