forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 50
From NVIDIA Megatron-LM for visibility #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
RaymondLi0
wants to merge
4,810
commits into
bigcode-project:multi-query-attention
Choose a base branch
from
NVIDIA:main
base: multi-query-attention
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Inference functional test: 580M Minitron See merge request ADLR/megatron-lm!2812
…ron" This reverts commit f8c8c9c.
Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
…hanges during inference
Invalidate cached SSM tensors if batch size changes during inference See merge request ADLR/megatron-lm!3277
ci: Move unit test logic to file See merge request ADLR/megatron-lm!3291
…'serialization_format'
Adapt _write_item call to new signature with 'serialization_format' See merge request ADLR/megatron-lm!3243
Co-authored-by: Russell Hewett <rhewett@nvidia.com>
Add in-process restart See merge request ADLR/megatron-lm!2711
This reverts commit d87ba91.
ci: Run on multiple clusters See merge request ADLR/megatron-lm!3292
ci: Allow specific TE-ref See merge request ADLR/megatron-lm!3302
ci(fix): Write logs to log_dir See merge request ADLR/megatron-lm!3299
Address dist checkpointing PyT 24.08 failure See merge request ADLR/megatron-lm!3253
ci(hotfix): Downstream pipeline See merge request ADLR/megatron-lm!3307
…nal argparse flag to clear GPU... Co-authored-by: Szymon Migacz <smigacz@nvidia.com>
MR feedback: added units for arguments, optional argparse flag to clear GPU... See merge request ADLR/megatron-lm!3308
…mamba class constructor Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Allow process group as optional argument for mamba class constructor See merge request ADLR/megatron-lm!2966
…ssues in checkpointing
Revert `fork` to `spawn` based on stability issues in checkpointing See merge request ADLR/megatron-lm!3450
…able quantization configuration Co-authored-by: Simon Layton <slayton@nvidia.com>
Add kitchen extension with per-layer configurable quantization configuration See merge request ADLR/megatron-lm!3301
Add deprecation warning for legacy inference See merge request ADLR/megatron-lm!3474
…ings to avoid conflicts
Change naming of original_max_position_embeddings to avoid conflicts See merge request ADLR/megatron-lm!3181
…when it fails arg checks
…main' Make cudagraph replay check more descriptive when it fails arg checks See merge request ADLR/megatron-lm!3472
…der tests in CI for MCore Encoder Refactoring Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
…o 'main' M4 Taskforce: Disable T5 and encoder_and_decoder tests in CI for MCore Encoder Refactoring See merge request ADLR/megatron-lm!3414
…s like 'pre_wd_mult' instead of 'wd_mult'
Quick fix for NeMo: handle alternate key names like 'pre_wd_mult' instead of 'wd_mult' See merge request ADLR/megatron-lm!3444
chore: Bump version 0.14.0 See merge request ADLR/megatron-lm!3477
Co-authored-by: Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com>
Added offloading support for MCore layers See merge request ADLR/megatron-lm!3071
… avoid shuffling of new tokens Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Bug fix to reset kv chunks assigned to -1 and avoid shuffling of new tokens See merge request ADLR/megatron-lm!3437
chore: Add init to tools See merge request ADLR/megatron-lm!3483
Fix unit test test_fp8_param.py blockwise scaling See merge request ADLR/megatron-lm!3480
chore: Add init to examples See merge request ADLR/megatron-lm!3492
build: Force pin down setuptools See merge request ADLR/megatron-lm!3493
Pad input tensors and enable fp8 weights for fp8 inference See merge request ADLR/megatron-lm!3341
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.