Closed
Description
Although Adagrad use less memory space than Adam, DeepSpeed's cpu_offload will be also of benefit to such cases.
Follow the DeepSpeedCPUAdam, I have implemented one.
Although Adagrad use less memory space than Adam, DeepSpeed's cpu_offload will be also of benefit to such cases.
Follow the DeepSpeedCPUAdam, I have implemented one.