Skip to content

Update OSITrace to be used in osi-validation #762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions osi3trace/osi_trace.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ def __init__(self, path=None, type_name="SensorView", cache_messages=False):
self.read_complete = False
self.message_cache = {} if cache_messages else None
self._header_length = 4
self.timestep_count = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant to the length of message_offsets, and only valid once a file has been fully read in, so is not an apropriate API to the outside.

if path:
self.from_file(path, type_name, cache_messages)

Expand All @@ -49,13 +50,16 @@ def from_file(self, path, type_name="SensorView", cache_messages=False):
self.message_offsets = [0]
self.message_cache = {} if cache_messages else None

self.timestep_count = len(self.retrieve_offsets())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turns a simple open operation into a very expensive operation, as it must read through the whole file to extract all the offsets. Sometimes this is needed, but in most instances you want to actually read and process a file sequentially, either step-by-step, or you need all messages anyway, in which case you also do not want to retrieve the offsets but rather read the messages at the same time. So that is a big no-no.

self.type = self.map_message_type(type_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplication of line 40.


def retrieve_offsets(self, limit=None):
"""Retrieve the offsets of the messages from the file."""
if not self.read_complete:
self.current_index = len(self.message_offsets) - 1
self.file.seek(self.message_offsets[-1], 0)
while (
not self.read_complete and not limit or len(self.message_offsets) <= limit
while not self.read_complete and (
not limit or len(self.message_offsets) <= limit
Comment on lines -57 to +62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bugfix is important,

It just makes me regret that we added black reformatting checks that this is convoluted by the arbitrary re-indentations needed. We probably should increase the line length to avoid stupidity like this.

):
self.retrieve_message(skip=True)
return self.message_offsets
Expand Down Expand Up @@ -156,6 +160,21 @@ def get_messages_in_index_range(self, begin, end):
yield message
current += 1

def cache_messages_in_index_range(self, begin, end):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole method is a Bad Idea(tm): A cache is something that stores things that have already been accessed.
If you just want to read everything into memory, this is not a cache, and should not reside inside OSI Trace, but rather inside the recipient that wants to work with the data, i.e. osi-validation. I actually hesitated in re-implementing the cacheing behavior because of the ripe potential for mis-use, as shown here.

In any case, the iterator returned from get_messages_in_index_range will already place all read messages into the cache, so the implementation is also needlessly redundant.

If one just wants to have all messages available in memory, then this really is not cacheing, it is just buffer reading and should be handled as such; i.e. if I want all messages anyway, then the simple one-liner suffices:

   trace = OSITrace(path, type) # Note: No cacheing in OSITrace
   messages = [*trace.get_messages()]
   trace.close()

And then access the messages by index via messages[i]. This is both the most efficient, and straightforward. If one wants to read in batches, something very similar can be done:

   from itertools import islice
   from functools import partial

   trace = OSITrace(path, type) # Note: No cacheing in OSITrace
   for batch in iter(partial(lambda n,i: list(islice(i,n)),batch_size,iter(trace.get_messages())), []):
     # Process batch here
   trace.close()

In Python 3.12 and later itertools.batched could be used directly.

In summary I would say osi-validation code that handles messages is currently a mess and should be rewritten to more clearly do the batch-processing it needs without multiple reads/re-reads of the file and deep interaction with internal mechanisms of the reader.

"""
Put all messages from index begin to index end in the cache. Then the
method ``get_message_by_index`` can access to it in a faster way.

Using this method again clear the last cache and replace it with a new
one.
"""
self.message_cache = {
index + begin: message
for index, message in enumerate(
self.get_messages_in_index_range(begin, end)
)
}

def close(self):
if self.file:
self.file.close()
Expand Down