Open
Description
📚 Describe the documentation issue
Currently, training_benchmark_xpu.py only support training with multiple XPU device, but only on the single node. If user want to try it on multiple node, each node with multiple XPU device, this script may need some minor modification. However, it's non trivial to make it work, I would like to submit a PR to improve the user experience when they want to launch multi-node multi-XPU training.
Suggest a potential alternative/fix
To my knowledge, the following files need modification
- training_benchmark_xpu.py: need to modify the
get_dist_params()
function, which is used to initialize the DDP process group. - README.md: need to give a detail guide on how to setup environment and launch the multi-node training.