Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pytorch gradients #1450

Merged
merged 50 commits into from
Oct 10, 2024
Merged

Fix pytorch gradients #1450

merged 50 commits into from
Oct 10, 2024

Conversation

Simone-Bordoni
Copy link
Contributor

@Simone-Bordoni Simone-Bordoni commented Sep 16, 2024

Fix #1449

Checklist:

  • Reviewers confirm new code works as expected.
  • Tests are passing.
  • Coverage does not decrease.
  • Documentation is updated.

@Simone-Bordoni Simone-Bordoni added the bug Something isn't working label Sep 16, 2024
@Simone-Bordoni Simone-Bordoni self-assigned this Sep 16, 2024
@renatomello renatomello added this to the Qibo 0.2.13 milestone Sep 17, 2024
@BrunoLiegiBastonLiegi
Copy link
Contributor

BrunoLiegiBastonLiegi commented Sep 17, 2024

I found a potential problem when you cast the parameters of parametrized gates:

return self.np.tensor(x, dtype=self.dtype, requires_grad=self.requires_grad)

here you cast to self.dtype which is usually torch.complex128 but it should actually be torch.float32/64 since this is a parameter. I don't know whether this is the origin of the problem with the gradients though.

EDIT: nevermind, you necessarily have to cast to complex otherwise when you build the matrix elements pytorch complains about mismatching dtypes...

@Simone-Bordoni
Copy link
Contributor Author

Simone-Bordoni commented Sep 17, 2024

I found a potential problem when you cast the parameters of parametrized gates:

return self.np.tensor(x, dtype=self.dtype, requires_grad=self.requires_grad)

here you cast to self.dtype which is usually torch.complex128 but it should actually be torch.float32/64 since this is a parameter. I don't know whether this is the origin of the problem with the gradients though.
EDIT: nevermind, you necessarily have to cast to complex otherwise when you build the matrix elements pytorch complains about mismatching dtypes...

Actually this may be part of the problem, i am rewriting in a better way the whole casting function differentiating the matrix_dtype from the parameters_dtype

@Simone-Bordoni
Copy link
Contributor Author

Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly.
I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py).
In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful.
loss

@Simone-Bordoni
Copy link
Contributor Author

Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful. loss

Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful. loss

I found out the problem was with the target state. now the gradients are passing correctly,
as soon as possible I will clean up the code and it should be ready for a review.

@Simone-Bordoni Simone-Bordoni marked this pull request as ready for review September 24, 2024 13:47
@Simone-Bordoni
Copy link
Contributor Author

I think that now everything has been fixed.
The changes required were bigger than expected, I had to add the backend in the gate decompositions and in other parts.
I have added a test to check the correct backpropagation in test_torch_gradients.py so that thay can be easily moved to qiboml.

Copy link

codecov bot commented Sep 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.93%. Comparing base (36ed3fe) to head (3f22d8d).
Report is 51 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1450      +/-   ##
==========================================
- Coverage   99.93%   99.93%   -0.01%     
==========================================
  Files          81       81              
  Lines       11820    11790      -30     
==========================================
- Hits        11812    11782      -30     
  Misses          8        8              
Flag Coverage Δ
unittests 99.93% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@BrunoLiegiBastonLiegi BrunoLiegiBastonLiegi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:

  • The new requires_grad argument added to cast. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the _curve_fit case discussed above) . For the cases when it is not possible I would create specific backend.cast_parameter()
# numpy
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs)
#  torch
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs).requires_grad_()
  • The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.

tests/test_backends_torch_gradients.py Outdated Show resolved Hide resolved
tests/test_backends_torch_gradients.py Outdated Show resolved Hide resolved
tests/test_backends_torch_gradients.py Show resolved Hide resolved
tests/test_backends_torch_gradients.py Outdated Show resolved Hide resolved
tests/test_gates_gates.py Outdated Show resolved Hide resolved
tests/test_gates_gates.py Outdated Show resolved Hide resolved
tests/test_models_hep.py Outdated Show resolved Hide resolved
tests/test_quantum_info_entanglement.py Outdated Show resolved Hide resolved
@Simone-Bordoni
Copy link
Contributor Author

Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:

  • The new requires_grad argument added to cast. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the _curve_fit case discussed above) . For the cases when it is not possible I would create specific backend.cast_parameter()
# numpy
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs)
#  torch
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs).requires_grad_()
  • The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.

Regarding the decomposition, as it is now you need the backend.
Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:

  • cast parameter tries to convert integer to float in torch
  • the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.
  • cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.

So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()

1 similar comment
@Simone-Bordoni
Copy link
Contributor Author

Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:

  • The new requires_grad argument added to cast. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the _curve_fit case discussed above) . For the cases when it is not possible I would create specific backend.cast_parameter()
# numpy
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs)
#  torch
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs).requires_grad_()
  • The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.

Regarding the decomposition, as it is now you need the backend.
Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:

  • cast parameter tries to convert integer to float in torch
  • the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.
  • cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.

So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()

@Simone-Bordoni
Copy link
Contributor Author

Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:

  • cast parameter tries to convert integer to float in torch
  • the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.
  • cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.

So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()

Actually cast_parameter() was made to cast a single parameter for matrix_parametrized().
An idea could be making it a helper method in the torch backend and creating a new public mthod like cast_parameters() to be used in these situations.
Anyway I don't know if @renatomello agrees on adding this new function to the numpy backend to avoid different backend behaviours.

@renatomello renatomello changed the title Fix pytorch gradients Fix pytorch gradients Oct 7, 2024
@BrunoLiegiBastonLiegi BrunoLiegiBastonLiegi added this pull request to the merge queue Oct 10, 2024
Merged via the queue into master with commit d290601 Oct 10, 2024
27 checks passed
@renatomello renatomello deleted the fix_autodiff branch October 17, 2024 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants