-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiStepParametricLIFNode cupy backend bug #151
Comments
A warning should be made to avoid using MultiStepParametricLIFNode in previous version |
I will add a "bug list" in the readme to show previous bugs. |
In the current version:
|
In the current version https://github.com/fangwei123456/spikingjelly/tree/4381767d0a09c2dc6f66537b68f461222b6a795e : from spikingjelly.clock_driven import neuron_kernel, neuron
device = 'cuda:0'
neuron_kernel.check_multi_step_neuron_output_and_grad(device, neuron.MultiStepParametricLIFNode) The gradients of fp16 are wrong:
|
The frist problem is:
The second 'stride' should be renamed. |
I find that this problem is caused by too many neurons and the accumulated gradients excced the range of half. |
spikingjelly/spikingjelly/clock_driven/neuron_kernel.cu
Line 2845 in ee2b22f
Shared memory is allocated per thread block, so all threads in the block have access to the same shared memory.
This reduction only sums across threads in each block and ignores reduction over blocks.
We did not find this bug because we checked gradients between cupy and torch backends with a small number of neurons, and blocks number is only 1.
When we change
shape = [63, 127]
toshape = [63, 4097]
, we can find the gradient error:The text was updated successfully, but these errors were encountered: