You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: fix CPU offloading in FSDP grad clipping and weight updates
Updates the gradient clipping implementation to correctly handle parameters offloaded to CPU, bypassing CUDA-specific optimizations when necessary to prevent runtime errors. Refactors the FSDP engine's weight broadcasting logic to properly materialize and batch DTensors in offloaded scenarios. Additionally, introduces a new test suite to verify gradient normalization and clipping behavior across different device configurations.
0 commit comments