Open
Description
🚀 Feature
Pytorch lightning recently added native support for MS DeepSpeed.
I believe it is also helpful for users if ignite incorporates the DeepSpeed pipeline for memory-efficient distributed training.
1. for idist.auto_model ..?
To initialize the DeepSpeed engine:
model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,
model=model,
model_parameters=params)
And for distributed environment setup, we need to replace torch.distributed.init_process_group(...)
to deepspeed.init_distributed()
2. checkpoint handler
slightly different thing for checkpointing
model_engine.save_checkpoint(args.save_dir, ckpt_id, client_sd = client_sd)