Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-merge from NVIDIA main #68

Open
wants to merge 28 commits into
base: multi-query-attention
Choose a base branch
from
Open

Conversation

RaymondLi0
Copy link
Collaborator

@RaymondLi0 RaymondLi0 commented Jun 27, 2023

Among other things, fixes a backward compatibility issue of the checkpoint merging tools introduced by the previous merge.

jaredcasper and others added 28 commits May 19, 2023 14:05
Switches the cache to using md5 hashes of a text description instead
of crafted filenames to determine a "cache hit".

Changes the default location of these files to be an "index-cache"
directory inside the data root. Should leave the data directories a
bit cleaner, especially with these filenames being a bit "uglier".

For GPT the code will first look in this default location before
building a new index and caching it the specified data cache path (or
this default if none is given).

For Blendable dataset it will only look for and save the indices if a
data cache path is provided, otherwise it will just rebuild every
time.
Add option to overlap p2p communication.

See merge request ADLR/megatron-lm!621
Add option to specify a data cache path separate from data directory.

See merge request ADLR/megatron-lm!608
Fix GPTDataset assert.

See merge request ADLR/megatron-lm!624
Fixed rotary_pos_emb's position in layer's forward args.

See merge request ADLR/megatron-lm!625
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Fix indexation for output tensor after gradscaler call

See merge request ADLR/megatron-lm!627
Perform grad sync at correct place in interleaved pipeline parallelism

See merge request ADLR/megatron-lm!628
Supporting loading checkpoints without add_position_embedding arg.

See merge request ADLR/megatron-lm!623
Add workarounds for non-determinism in Megatron training

See merge request ADLR/megatron-lm!607
Update gitlab to catch pytest errors

See merge request ADLR/megatron-lm!635
Remove use of deprecated np.float in indexed_dataset.py

See merge request ADLR/megatron-lm!634
Retro fix for tensor parallelism.

See merge request ADLR/megatron-lm!632
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants