You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* add direct meg-ds to hf format script (NVIDIA#110)
* add direct meg-ds to hf format script (part2) (NVIDIA#111)
* add direct meg-ds to hf format script
* split into 2 function
* update the usage doc
* make scripts executable
* add shebang
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
mayank31398
pushed a commit
to mayank31398/BigCode-Megatron-LM
that referenced
this issue
Jun 21, 2023
* add direct meg-ds to hf format script (NVIDIA#110)
* add direct meg-ds to hf format script (part2) (NVIDIA#111)
* add direct meg-ds to hf format script
* split into 2 function
* update the usage doc
* make scripts executable
* add shebang
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
When launching very long training runs, building the index mappings can take more than 1 minute.
The consequence is that the other ranks will timeout. https://github.com/bigcode-project/Megatron-LM/blob/multi-query-attention/megatron/training.py#L962
However the timeout passed to
torch.distributed.initialize
is 10 mins. Why isn't this value used intorch.distributed.broadcast
?The workaround for now is to first create the index mappings on a single worker, as a preliminary run.
The text was updated successfully, but these errors were encountered: