Replies: 1 comment 2 replies
-
Have you had a chance to try this? - @custom_fwd(device_type='cuda', cast_inputs=torch.float32)
+ @custom_fwd(device_type='cuda' if torch.cuda.is_available() else 'cpu', cast_inputs=torch.float32) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
from torch_geometric.nn.attention import PerformerAttention
My model uses this attention mechanism, and when I perform mixed precision training with torch.amp, I get NaN loss. After investigation, the issue was pinpointed to the PerformerAttention; after training on only dozens of batches of data, the attention computation resulted in NaN.
By consulting the torch.amp documentation, I adopted a simple solution:
In the file: torch_geometric/nn/attention/performer.py
This method prevents torch.amp.autocast from converting PerformerAttention to half precision.
Although there is no acceleration for the attention mechanism, torch.amp can still speed up other parts of my model, resulting in an overall reduction in training time.
This method requires manually specifying the device. Does anyone know how to automatically specify the device?
I tried using device_type=next(self.parameters()).device, but it couldn't access the self parameter, which resulted in the error 'NameError: name 'self' is not defined'.
My environment:
Beta Was this translation helpful? Give feedback.
All reactions