Skip to content

Delta-encoding and time models #439

Open
@doganulus

Description

@doganulus

Describe the feature

This issue discusses the delta encoding feature mentioned here. The idea is to encode just fields which change in an OSI trace file over time. Hence would reduce the file size and better the performance of any simulation tool which uses delta encoding.

This issue illustrates the basic concept and aims to start a discussion/documentation/formalization for this important feature.

Describe the solution you would like

An OSI trace file represents a temporal behavior that maps points in time to fully populated OSI messages. Below I would proceed over examples on how to represent them and I use JSON syntax for simplicity.

  1. Uncompressed (golden) representation

Below is a discrete time behavior as time field implicitly increases 1 time unit per message and each message is fully populated. This would be the golden representation for a temporal behavior but also it is super-inefficient when time unit is chosen small (e.g. nanoseconds as currently set).

{"A": "a1", "B": "b1", "C": "c1"}   //time: 0
{"A": "a2", "B": "b1", "C": "c1"}   //time: 1
{"A": "a2", "B": "b1", "C": "c1"}   //time: 2
{"A": "a2", "B": "b2", "C": "c1"}   //time: 3
{"A": "a2", "B": "b2", "C": "c1"}   //time: 4
{"A": "a2", "B": "b2", "C": "c1"}   //time: 5
{"A": "a2", "B": "b2", "C": "c1"}   //time: 6
{"A": "a2", "B": "b2", "C": "c1"}   //time: 7
{"A": "a1", "B": "b2", "C": "c2"}   //time: 8
{"A": "a1", "B": "b2", "C": "c2"}   //time: 9
  1. Compressing in time

Often we want to use a small time unit when denoting time as complex systems and their components operates on many different timescales. The choice of small time unit creates a lot of repetitive observations that can be handled by compressing the behavior in time, only recording time points when something changed, as follows:

{"time":0, "A": "a1", "B": "b1", "C": "c1"}   //time: 0
{"time":1, "A": "a2", "B": "b1", "C": "c1"}   //time: 1
{"time":3, "A": "a2", "B": "b2", "C": "c1"}   //time: 3
{"time":4, "A": "a2", "B": "b2", "C": "c1"}   //time: 4
{"time":8, "A": "a1", "B": "b2", "C": "c2"}   //time: 8
{"time":9, "A": "a1", "B": "b2", "C": "c2"}   //time: 9

This is what I would call a dense time behavior and repeating messages are skipped and we add a time field (stamp) not to lose where we are in time. Hence, we can jump an arbitrary amount of time at each new message. In practice, we just start at this level and currently how it is done in OSI and elsewhere.

Up to now, this was just an introduction but I also want to show that this is a half-way practice as we can compress more, using delta-encoding.

  1. Compress in time and value (desired feature)

Finally I show the desired delta-encoded representation compressed both in time and value.

{"time":0, "A": "a1", "B": "b1", "C": "c1"}
{"time":1, "A": "a2"}                         
{"time":3,            "B": "b2"}                                              
{"time":8, "A": "a1",            "C": "c2"}    
{"time":9} 

It is important that the meaning of temporal behavior didn't change from (1) to (3) while the size would be significantly smaller in practice. That's easier to read/write/copy/transmit.

Describe alternatives you have considered

In the example above, we apply the concept of forward persistence, that is, once a field set in time, it is interpreted to hold its value until set again. The backward persistence, that is, we interpret the field value holds since last time point, is also possible but the former is chosen because it is the standard behavior of variable assignment in programming languages.

Describe the backwards compatibility

This feature alters the meaning of missing fields in OSI messages ("missing -> as same as before", previously "missing -> default value or null") for all field types. Please see the topic of Field Presence for protobufs and Nullable Scalar for flatbufs. While proto2 support explicit field presence for scalar, we need to handle it explicitly for proto3 and flatbuffers. This may create a backward compatibility issue.

Additional context

  • Delta-encoding as described above does not need different file formats given explicit field presence is applied. Beyond corner-cases above, arbitrary amounts of compression is allowed. Repeating an unchanged field in the next message is redundant but it's fine.
  • In this issue, I skip the question of interpolation between messages in time. By default piecewise-constant interpolation is a good choice for state-like fields (thus my choice in examples). We have more options for event-like and numerical fields to be discussed later. In the context of OSI standard, we may consider the classification of message fields (state, event, numeric, array, etc).
  • In signal processing, delta encoding means transmitting the numerical difference between samples in time. This is related but we have considered value inequality of the same field here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AnalysisAn issue or MR that needs expert analysis to determine what to do nextConceptAn issue that is being detailed out through expert discussion and offline work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions