Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout on creating the index mappings #15

Open
RaymondLi0 opened this issue Jan 4, 2023 · 0 comments
Open

Timeout on creating the index mappings #15

RaymondLi0 opened this issue Jan 4, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@RaymondLi0
Copy link
Collaborator

When launching very long training runs, building the index mappings can take more than 1 minute.
The consequence is that the other ranks will timeout. https://github.com/bigcode-project/Megatron-LM/blob/multi-query-attention/megatron/training.py#L962
However the timeout passed to torch.distributed.initialize is 10 mins. Why isn't this value used in torch.distributed.broadcast?

The workaround for now is to first create the index mappings on a single worker, as a preliminary run.

@RaymondLi0 RaymondLi0 added the bug Something isn't working label Jan 4, 2023
mayank31398 pushed a commit to mayank31398/BigCode-Megatron-LM that referenced this issue Jun 18, 2023
* add direct meg-ds to hf format script (NVIDIA#110)

* add direct meg-ds to hf format script (part2) (NVIDIA#111)

* add direct meg-ds to hf format script

* split into 2 function

* update the usage doc

* make scripts executable

* add shebang

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
mayank31398 pushed a commit to mayank31398/BigCode-Megatron-LM that referenced this issue Jun 21, 2023
* add direct meg-ds to hf format script (NVIDIA#110)

* add direct meg-ds to hf format script (part2) (NVIDIA#111)

* add direct meg-ds to hf format script

* split into 2 function

* update the usage doc

* make scripts executable

* add shebang

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant