You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upon inspection of the SDDS code, I think I stumbled upon multiple possible bugs.
a) if you have entries with duplicate names, these will overwrite each other, as they are stored by name in the dict. -> this should at least raise an error somewhere (e.g. assert len(names) == len(set(names)))
b) The order of the definitions in the header is sorted (parameter, array, column) upon reading, but not the data. And it is sorted before reading the data. So basically, the data needs to be already sorted in that order. If sorting would actually sort anything, the file could not be read anymore.
Same goes upon writing: here actually the data is always written in that order, but the order of the headers is whatever came into the SDDSFile init. Here sorting the header would make sense, but it is not done!!
We never encountered problems, as we read LHC data which is already sorted and mostly use the turn_by_turn package for writing, which sorts the entries in the same way.
If it wouldn't the order in the header would be different from the order in the data and that order is actually required.
My suggestion is to remove all of that sorting and loop in the writer simply through the lists and write parameter, array, column one at a time, instead of gathering them first.
Possible Implementation
We could add a random hash to the defintions, maybe build from name + some random chars.
Then definitions + values could be combined into a single dict with definitions as keys and the corresponding values as values. (which we could also accept as alternative input to the init, so either two lists, or this kind of dict).
The only problem is, that we then have to rewrite the getter a bit, so that if a name-string is given, we actually return a tuple in case multiple entries are found.
something like that.
The text was updated successfully, but these errors were encountered:
Feature Description
Upon inspection of the SDDS code, I think I stumbled upon multiple possible bugs.
a) if you have entries with duplicate names, these will overwrite each other, as they are stored by name in the dict. -> this should at least raise an error somewhere (e.g. assert len(names) == len(set(names)))
b) The order of the definitions in the header is sorted (parameter, array, column) upon reading, but not the data. And it is sorted before reading the data. So basically, the data needs to be already sorted in that order. If sorting would actually sort anything, the file could not be read anymore.
Same goes upon writing: here actually the data is always written in that order, but the order of the headers is whatever came into the SDDSFile init. Here sorting the header would make sense, but it is not done!!
We never encountered problems, as we read LHC data which is already sorted and mostly use the turn_by_turn package for writing, which sorts the entries in the same way.
If it wouldn't the order in the header would be different from the order in the data and that order is actually required.
My suggestion is to remove all of that sorting and loop in the writer simply through the lists and write parameter, array, column one at a time, instead of gathering them first.
Possible Implementation
We could add a random hash to the defintions, maybe build from name + some random chars.
Then definitions + values could be combined into a single dict with definitions as keys and the corresponding values as values. (which we could also accept as alternative input to the init, so either two lists, or this kind of dict).
The only problem is, that we then have to rewrite the getter a bit, so that if a name-string is given, we actually return a tuple in case multiple entries are found.
something like that.
The text was updated successfully, but these errors were encountered: