Skip to content

Add shift start time function. #3509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 20, 2024

Conversation

JoeZiminski
Copy link
Collaborator

@JoeZiminski JoeZiminski commented Oct 28, 2024

This PR adds a 'shift start time' function that a user can use to set a new start time for their recording.

This follows from discussions on the time API #3117 and #3157. Briefly, there are two ways to handle time in SI, a t_start recording attribute that is used to set the start time. Under this regime, the (regularly) spaced timeseries is regenerated on the fly from t_start, num samples and sampling frequency. Alternatively, a (more memory heavy) time_vector can be set to explicitly set the time array for a recording. As I understand it this is useful in the case there are gaps in the recording or there is some clock drift. From the user perspective, they always get a time vector back from get_times().

This PR implements a shift_start_time as suggested by @h-mayorquin as a convenient way for users to shift the start time of their recording, abstracting away the technical details of the implementation under the hood (t_start vs time_vector).

I have updated the PR #3072 containing documentation on times, the relevant proposed section can be found here.

@alejoe91 alejoe91 added the core Changes to core module label Oct 29, 2024
Copy link
Member

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if somewhere we should indicate that if t_start it shifts that, but if time_vector then it shifts all times within the vector.

@samuelgarcia
Copy link
Member

HI Joe.
I am not sure this approach is persistent to json dump/load cycle. If this is a need then we should have a dedicated class for that handle explicitly the shbift in the kwargs

@JoeZiminski
Copy link
Collaborator Author

Hey @zm711 this is a good point, we (@h-mayorquin) decided that from a user perspective the implementation under the hood should not leak through the API. So as far as the user is concerned the user just as 'times' and there is no distinction between the t_start implementation and time_vector. But maybe this is useful to have somewhere in the docstring for the more interested user, maybe notes?

Thanks @samuelgarcia I'm not 100% sure I understand. This PR is editing the existing segment attributes, will these not be propagated to a newly saved version of the recording? This would be an important test.

e.g. (pseudocode, I can't remember the exact commands)

recording.save("some_path")  # save with original times

recording.shift_start_time(10)  # shift times

recording.save("some_path")  # the saved data will not be updated?
load_recording - si.load("some_path")  # this will not have the updated times?

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward despite all the noise that we have created.

Things to test for:

  1. recording with times vs recording with t_start implementation, see that the shift works.
  2. Json and pickling of base recording object.
  3. I think this is not properly propagated to preprocessing (vague memory), in my view, it is fine if we deal with that into another PR but is a good thing to keep on mind.

@h-mayorquin
Copy link
Collaborator

Also, @JoeZiminski @zm711 I am against adding notes about internal implementation details on the docstring (for the reason that Joel gave) but I think a top level comment on the function can work.

My reasoning is that if you change the internal implementation then you need to update the docstring which should be user facing and unaffected by this.

@zm711
Copy link
Member

zm711 commented Nov 1, 2024

I'm fine seeing myself off of this thread. I honestly never use this machinery for my purposes so it is really hard for me to imagine the point of this and how I as a user would interact with these functions. So what is implementation vs what is user facing is hard for me to parse. Maybe a call at some point (dedicated just to time machinery) between those who care would be helpful (at least for me). I agree with Heberto not to leak internal stuff, but so I would like to understand internal better.

@JoeZiminski
Copy link
Collaborator Author

Thanks @h-mayorquin I agree a top level comment is nice for this. Cheers for the feedback @zm711! Yes I think a call would be nice, @samuelgarcia also mentioned at some point he will do some time-alignment stuff so that might be a good opportunity.

@JoeZiminski JoeZiminski changed the title Add 'shift start time' function. Add shift start time function. Nov 13, 2024
@JoeZiminski JoeZiminski force-pushed the add_shift_time_function branch from 6dd2a31 to 4d7246a Compare November 13, 2024 18:33
@JoeZiminski JoeZiminski force-pushed the add_shift_time_function branch from 9379f53 to a1cf336 Compare November 13, 2024 18:36
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

"""
_, times_recording, all_times = self._get_fixture_data(request, fixture_name)

num_segments, orig_seg_data = self._store_all_times(times_recording)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could just do recording.copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice this will allow to remove a lot of boiler plate and make the assert more explicit. However check this out!:

        import copy

        _, times_recording, all_times = self._get_fixture_data(request, fixture_name)

        orig_times_recording = copy.deepcopy(times_recording)

        # num_segments, orig_seg_data = self._store_all_times(times_recording)

        times_recording.shift_times(shift)  # use default `segment_index=None`

        for idx in range(num_segments):
            assert np.allclose(
                orig_times_recording.get_times(segment_index=idx),
                times_recording.get_times(segment_index=idx) - shift,
                rtol=0, atol=1e-8
            )

This fails, orig_times_recording.get_times(segment_index=idx) gives

# array([0.00000000e+00, 3.33333333e-05, 6.66666667e-05, ...,
 #      9.99990000e+00, 9.99993333e+00, 9.99996667e+00])

It seems the time vector info is not being propagated in the copy, though it seemed to work okay for the save/ load 🤔 will have to come back to this, do you have ideas why this might be happening?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm yeah, I think that that's probably by design. I think cloning uses to do_dict mechanism and that coupled with saving to disk and serializing. Because timestamps can be large we probably don't want that there.

The solution is to change copy (which should be a in memory thing) to copy the time vector but I think that should be done in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little out of scope, but do you remember when we were looking up copy vs clone at Edinburgh. I forget what it said (but basically one is deep and one is suppose to be shallow). It would probably be worth us making clear in our docstrings if our copy's are shallow or not (also for another PR).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused sorry, what is clone vs copy? BTW I forgot to say based on the suggested command, I got an error:

(Pdb) times_recording.copy()
*** AttributeError: 'NoiseGeneratorRecording' object has no attribute 'copy'

should this have worked?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused sorry, what is clone vs copy? BTW I forgot to say based on the suggested command, I got an error:

(Pdb) times_recording.copy()
*** AttributeError: 'NoiseGeneratorRecording' object has no attribute 'copy'

should this have worked?

No I don't think this would work. I guess @h-mayorquin was suggesting to do a:

from copy import copy

copy(times_recording)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused sorry, what is clone vs copy?

Some projects use copy(deep=True/False) to indicate whether to just do metadata vs actual array data whereas other projects use clone vs copy to make the deep vs shallow distinction (again deep being everything and shallow just being metadata). Because there is no requirement for clone and copy being shallow or deep it would probably be better for us as a project to either make it clear in the docstring or for us to use the clone/copy names to make clear which functions are deep and which are shallow. Is that clearer? Again not crucial to this PR, but related to the fact that the array is not being copied (ie our copy is shallow).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zm711 that makes sense! I opened an issue to discuss this #3546.

@JoeZiminski
Copy link
Collaborator Author

@alejoe91 this is good to go from my end

@alejoe91 alejoe91 merged commit 3fd3d97 into SpikeInterface:main Nov 20, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants