Skip to content

Conversation

@RaymondLi0
Copy link
Collaborator

No description provided.

shanmugamr1992 and others added 30 commits October 6, 2022 17:02
different encoder/decoder num-layers support

See merge request ADLR/megatron-lm!453
Validation dataset update 1

See merge request ADLR/megatron-lm!455
Adding proper test cases

See merge request ADLR/megatron-lm!460
Core merge main

See merge request ADLR/megatron-lm!464
Remove noop used to try to force scheduling and check for environment variable instead.

See merge request ADLR/megatron-lm!463
inverse_square_root learning param schedule

See merge request ADLR/megatron-lm!466
Disable newline after colon

See merge request ADLR/megatron-lm!469
Sending in prompts with the wrong type hangs the server.  This is a check to make sure it's a list

See merge request ADLR/megatron-lm!473
Move most of mpu functionality into a new "Megatron core"

See merge request ADLR/megatron-lm!462
Fix merge error.

See merge request ADLR/megatron-lm!478
ViT Backbone Tensor Shape Fix

See merge request ADLR/megatron-lm!479
Support for variable sequence lengths across micro-batches

See merge request ADLR/megatron-lm!472
ksivaman and others added 18 commits May 27, 2023 00:17
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Check if RoPE support is available in TE

See merge request ADLR/megatron-lm!614
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Bug fixes for full activation recompute using TransformerEngine

See merge request ADLR/megatron-lm!615
Signed-off-by: Kirthi Shankar Sivamani <smkirthishankar@gmail.com>
Check TE version for rope during recompute

See merge request ADLR/megatron-lm!619
@mayank31398
Copy link

Hey @RaymondLi0 , thanks a lot for this effort.
Any ETA on this?

@RaymondLi0
Copy link
Collaborator Author

Hopefully early next week :)

@RaymondLi0 RaymondLi0 changed the title WIP: merge from Nvidia main merge from Nvidia main Jun 19, 2023
@RaymondLi0
Copy link
Collaborator Author

Some things are broken (I created a branch named before-merge, in case someone wants to do one of these):

  • loading previous checkpoints that used the distributed optimizer will not work
  • the checkpoint merging tools will not work on previous checkpoints (may be fixed later with another merge from NVIDIA's repo)

Copy link
Collaborator

@jlamypoirier jlamypoirier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just merge and hope for the best 🤷‍♂️

@RaymondLi0 RaymondLi0 merged commit 3e22c9f into multi-query-attention Jun 19, 2023
@RaymondLi0 RaymondLi0 deleted the NVIDIA-main branch June 19, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.