v0.4.4
The system is successfully evaluated on a multi-node system.
The benchmark scripts are integrated with memory-centric tiling borrowed from DeepSpeed.
It trains an 18B model on WeChat Yard.
The system is successfully evaluated on a multi-node system.
The benchmark scripts are integrated with memory-centric tiling borrowed from DeepSpeed.
It trains an 18B model on WeChat Yard.