Description
Describe the feature
This issue discusses the delta encoding feature mentioned here. The idea is to encode just fields which change in an OSI trace file over time. Hence would reduce the file size and better the performance of any simulation tool which uses delta encoding.
This issue illustrates the basic concept and aims to start a discussion/documentation/formalization for this important feature.
Describe the solution you would like
An OSI trace file represents a temporal behavior that maps points in time to fully populated OSI messages. Below I would proceed over examples on how to represent them and I use JSON syntax for simplicity.
- Uncompressed (golden) representation
Below is a discrete time behavior as time field implicitly increases 1 time unit per message and each message is fully populated. This would be the golden representation for a temporal behavior but also it is super-inefficient when time unit is chosen small (e.g. nanoseconds as currently set).
{"A": "a1", "B": "b1", "C": "c1"} //time: 0
{"A": "a2", "B": "b1", "C": "c1"} //time: 1
{"A": "a2", "B": "b1", "C": "c1"} //time: 2
{"A": "a2", "B": "b2", "C": "c1"} //time: 3
{"A": "a2", "B": "b2", "C": "c1"} //time: 4
{"A": "a2", "B": "b2", "C": "c1"} //time: 5
{"A": "a2", "B": "b2", "C": "c1"} //time: 6
{"A": "a2", "B": "b2", "C": "c1"} //time: 7
{"A": "a1", "B": "b2", "C": "c2"} //time: 8
{"A": "a1", "B": "b2", "C": "c2"} //time: 9
- Compressing in time
Often we want to use a small time unit when denoting time as complex systems and their components operates on many different timescales. The choice of small time unit creates a lot of repetitive observations that can be handled by compressing the behavior in time, only recording time points when something changed, as follows:
{"time":0, "A": "a1", "B": "b1", "C": "c1"} //time: 0
{"time":1, "A": "a2", "B": "b1", "C": "c1"} //time: 1
{"time":3, "A": "a2", "B": "b2", "C": "c1"} //time: 3
{"time":4, "A": "a2", "B": "b2", "C": "c1"} //time: 4
{"time":8, "A": "a1", "B": "b2", "C": "c2"} //time: 8
{"time":9, "A": "a1", "B": "b2", "C": "c2"} //time: 9
This is what I would call a dense time behavior and repeating messages are skipped and we add a time field (stamp) not to lose where we are in time. Hence, we can jump an arbitrary amount of time at each new message. In practice, we just start at this level and currently how it is done in OSI and elsewhere.
Up to now, this was just an introduction but I also want to show that this is a half-way practice as we can compress more, using delta-encoding.
- Compress in time and value (desired feature)
Finally I show the desired delta-encoded representation compressed both in time and value.
{"time":0, "A": "a1", "B": "b1", "C": "c1"}
{"time":1, "A": "a2"}
{"time":3, "B": "b2"}
{"time":8, "A": "a1", "C": "c2"}
{"time":9}
It is important that the meaning of temporal behavior didn't change from (1) to (3) while the size would be significantly smaller in practice. That's easier to read/write/copy/transmit.
Describe alternatives you have considered
In the example above, we apply the concept of forward persistence, that is, once a field set in time, it is interpreted to hold its value until set again. The backward persistence, that is, we interpret the field value holds since last time point, is also possible but the former is chosen because it is the standard behavior of variable assignment in programming languages.
Describe the backwards compatibility
This feature alters the meaning of missing fields in OSI messages ("missing -> as same as before", previously "missing -> default value or null") for all field types. Please see the topic of Field Presence for protobuf
s and Nullable Scalar for flatbuf
s. While proto2
support explicit field presence for scalar, we need to handle it explicitly for proto3
and flatbuffers
. This may create a backward compatibility issue.
Additional context
- Delta-encoding as described above does not need different file formats given explicit field presence is applied. Beyond corner-cases above, arbitrary amounts of compression is allowed. Repeating an unchanged field in the next message is redundant but it's fine.
- In this issue, I skip the question of interpolation between messages in time. By default piecewise-constant interpolation is a good choice for state-like fields (thus my choice in examples). We have more options for event-like and numerical fields to be discussed later. In the context of OSI standard, we may consider the classification of message fields (state, event, numeric, array, etc).
- In signal processing, delta encoding means transmitting the numerical difference between samples in time. This is related but we have considered value inequality of the same field here.