-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute_synchrony_metrics update #2605
compute_synchrony_metrics update #2605
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Thanks @chrishalcrow Can you heck why tests are failing? |
Yes, it's to do with me originally changing the output from |
for more information, see https://pre-commit.ci
src/spikeinterface/core/generate.py
Outdated
|
||
return (times, labels) | ||
return structured_insertions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same coment here about the output. Input and output are your contracts with your users. We should be careful with changing them (and careful to define them in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'll add some documentation. I think this (and the add_insertions) functions don't have much use outside of testing. Should I "hide" them (i.e. make them _add_insertions etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think that they will only be used for internal testing then the best is to have them in the corresponding testing module I feel.
I think hiding by default is a good option and then you make it public once you feel confident about the interface and purpose. That minimizes the risk if you want to change the function later. But this is my opinion, I don't think that anyone else would find any problems if you want to add some functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep them close to the synthesize_stuff functions, since they are closely related. I'm a fan of the "related code should be close together" mindset. Maybe they should all be in the testing module? But maybe for another time...
Great, I've hidden it and also added some documentation. Best of all worlds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good decision. If we ever use the function more and we are decided on the signature, then we can make it public.
---------- | ||
spikes : memmap | ||
The spike train | ||
synchrony_sizes : numpy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this given in seconds, samples, milliseconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the description more accruate (since it's actually a structured numpy array). One of the fields is "sample_index" which clarifies the unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the synchrony_sizes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. These are the number of synchronous events you want to count. So if you want to see when two or four spikes fire at the same time you use synchrony_sizes = (2,4). So it's an integer, and I think it will be clear for anyone who knows enough about the metric that their using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I guess my lack of familarity with the metric is evident then. I somehow imagined that it was the windows in which an event would be counted as syncronous. Thanks for explaining it.
Tip: Benchamark comments:
|
Oooh, thanks for the logspace function. Maybe it would be helpful to say: the new algorithm is expected to be faster. I was able to replace the use of np.complexity, isin and count_nonzero (I think complexity is the heavy function of these) with some simple selections (like Spike_length is just the number of spikes in the spike vector. So for generic data, this increases linearly with time. I’ve been using a 30 minute 16 channel recording from my lab, which contains 70,000 spikes after sorting. Is this realistic? I’ve also been using this to make sure my algorithm is consistent with the old results. My tests go up to 100,000 spikes. What would a good number to go up to? I'm testing with unit number now, going up to 1000 units. Is that high enough? The complexity function changes it’s method at some point, and I’m trying to figure out how and when and will report back soon! But the new code is still substantially faster at e.g. 1k units with 100k spikes. |
On a second thought, this seems more amenable to control and more relevant than duration. It is just probably me that does not have a good intuitive grasp of this metric while duration I understand better. How are you controlling this though, given that you are passing this parameter in duration divided by 1000?
1000 I have been told than 1000 units is like a very high number. Bear in mind that the comments I made about the benchmark is just me being curios and suggestions. As long as Sam and Alessio are OK with the rest I would not bother if you don't care about it. The thing that I really feel strongly about is for not changing the signature and output of |
Hello, I learned a few things from your comments and trying to recreate my previous benchmarking. Thanks! Before: I refactored the code, creating a new function compute_synchrony_count. Then I did a speed test on the new function using the old algorithm against an optimised version of it. This is hard for others to reproduce since you can’t easily run compute_synchrony_count. So now I’m benchmarking against the full compute_synchrony_metrics function. This should be closer to “real world” use than before and easier for anyone to check. And I’ve hopefully made the wording/details of the plots more obvious. The old benchmarking results are basically unchanged. And I’ve now done more benchmarking, with changing units. First, here’s a plot of timings with increasing units N but fixed spike train length (of about 50k: in our lab, this is the number of spikes in a typical 20 minute recording). The new algorithm grows linearly with the number of spikes. The old algorithm, which uses more complex functions, changes tactic at ~600 units (I think it’s np.unique that changes its tactic…). Both its tactics look like O(N) algorithms. But even when it changes, it is still about 10x slower than the new one. I also tried keeping the duration and firing rates fixed. So here, as the number of units N increases the length of the spike train increases at the same rate. But the density of spikes per time unit stays constant. Both algorithms grow as N^2, as expected (since units and train length are both increasing with N). Again, the old algorithm changes tactic but the new code is still significantly faster. The code for both is below:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for removing the change of the output and doing benchmark with units.
I sitll would like to keep the function for generating spikes as simple as possible so I request a removal or the arguments. As I mentioned before:
The thing that I really feel strongly about is for not changing the signature and output of synthesize_poisson_spike_vector.
Unfortunately, I can't review the synchrony metrics changes as I am not really well acquainted with the method. Maybe @DradeAW or @zm711 can take a look?
---------- | ||
spikes : memmap | ||
The spike train | ||
synchrony_sizes : numpy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the synchrony_sizes
@alejoe91 actually ported this from Elephant so he would be the best to review unless @DradeAW knows this super well. I never actually use this metric :) |
An important part of the update are the four |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with me and for adding thorough testing.
checking if this removes the changes requested tag
for more information, see https://pre-commit.ci
from int64 to int16 Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>
src/spikeinterface/core/generate.py
Outdated
def _add_spikes_to_spiketrain( | ||
spike_indices, | ||
spike_labels, | ||
segment_indices=[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
segment_indices=[], | |
segment_indices=None, |
See #2345
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To quote Joe: "Thanks I did not know about that, I am learning so much working on the SI codebase 😍!"
src/spikeinterface/core/generate.py
Outdated
segment_indices=[], | ||
added_spikes_indices=None, | ||
added_spikes_labels=None, | ||
added_segment_indices=[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added_segment_indices=[], | |
added_segment_indices=None, |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Hello, realised that the |
|
||
synchrony_metrics_dict = {} | ||
for sync_idx, synchrony_size in enumerate(synchrony_sizes_np): | ||
synch_id_metrics_dict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synch_id_metrics_dict = {} | |
sync_id_metrics_dict = {} |
Can we stick to the same naming? :)
Sorry for being overly annoying @chrishalcrow !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can name it anything you want if I can stop working on this ;) Just you wait: my next quality_metrics update is gonna be so careful!
Hi. |
@@ -496,7 +496,51 @@ def compute_sliding_rp_violations( | |||
) | |||
|
|||
|
|||
def compute_synchrony_metrics(sorting_analyzer, synchrony_sizes=(2, 4, 8), unit_ids=None, **kwargs): | |||
def get_synchrony_counts(spikes, synchrony_sizes, all_unit_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would excpet a delta in the signature no ?
This code was adapted from `Elephant - Electrophysiology Analysis Toolkit <https://github.com/NeuralEnsemble/elephant/blob/master/elephant/spike_train_synchrony.py#L245>`_ | ||
""" | ||
|
||
synchrony_counts = np.zeros((np.size(synchrony_sizes), len(all_unit_ids)), dtype=np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this the synhcony is higly dependant on the sampling rate no ?
Hello, indeed the metric is not robust to lags at all. This PR was just optimizing what already existed. I'm happy to add a delta, which would check if there were spikes within delta samples of the first one? |
Let's keep it like this for now, since this was the original design. :) |
(Another go of #2528, after the SortingAnalyzer refactor)
Planning to standardise, correctness test and optimise the quality metrics module. First one: the
compute_synchrony_metrics
function.Two main things in this:
synthesize_random_firings
andsynthesize_poisson_spike_vectors
were adjusted so that you can insert specific spikes, useful for testing. These are used in the new tests.compute_synchrony_counts
is introduced, which computes the synchrony counts.One thing I'm not sure about: The documentation said that
compute_synchrony_metrics
returns a dict, but I think we should export a namedtuple, which matches most other quality metrics.I adjusted the algorithm in
compute_synchrony_counts
, which should be easier to understand and is significantly faster. Here are some benchmarking results, with times and create a log-log plot to check the algorithm's complexity. About 30x faster!Benchmarked using the following code: