Skip to content

Conversation

@jmosbacher
Copy link
Contributor

@jmosbacher jmosbacher commented Aug 28, 2021

This is an implementation of issue #431, adds option to split the data by overlap with the two target chunks instead of full containment. The overlapping data is automatically trimmed on concatenation. This will reduce complexity of chunk alignment for plugins with multiple dependencies and allow for parallel processing of subclasses of OverlapWindowPlugin.

Can you briefly describe how it works?

  • added optional allow_overlap in Chunk.split method which enables the splitting on overlap.
  • added strict_bounds property to chunk to mark whether the chunk bounds (start,end) fully contain all its data.
  • chunk overlaps are trimmed on concatenation.
    Can you give a minimal working example (or illustrate with a figure)?
import strax
import straxen

st = straxen.contexts.demo()
c = next(st.get_iter( '180423_1021','raw_records',))
idx = len(c.data)//2  # not important but lets split approximately at the center
row = c.data[idx] 
t = row['time'] + row['dt']//2 # select a time that falls within the record interval
try:
    c1,c2 = c.split(t)
except strax.CannotSplit:
    print("Previous splitting logic fails.")

c1,c2 = c.split(t, allow_overlap=True) # after setting allow_overlap to True the split will succeed
assert c1.end == c2.start == t # split is done exactly at requested point in time.
assert c1.data['time'][-1]>c2.data['time'][0] # the two resulting chunks will overlap each other

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 85.889% when pulling c5a96b1 on jmosbacher:arbitrary_chunk_splitting into d3608ef on AxFoundation:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants