Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fp8-fused gemm kernel #5764

Merged
merged 16 commits into from
Jul 29, 2024
Merged

Conversation

sfc-gh-reyazda
Copy link
Contributor

This PR adds the new fused kernel for the Dense GeMM using fp8-quantized weight.

@jeffra
Copy link
Collaborator

jeffra commented Jul 12, 2024

One thing that needs to be resolved before merging this, this kernel requires triton==2.3.0. This should be checked at runtime and communicated to users somehow.

@HeyangQin
Copy link
Contributor

Hi @jeffra. To clarify, does this kernel require exactly triton==2.3 or triton>=2.3?

@jeffra
Copy link
Collaborator

jeffra commented Jul 12, 2024

Hi @jeffra. To clarify, does this kernel require exactly triton==2.3 or triton>=2.3?

@sfc-gh-reyazda would know better, I am not sure if we've tested with newer triton than 2.3.0. I have not personally tested this at least.

@sfc-gh-reyazda
Copy link
Contributor Author

Hi @jeffra. To clarify, does this kernel require exactly triton==2.3 or triton>=2.3?

It needs that specific version, unfortunately triton keeps changing/improving and their APIs change too so it is hard to track it properly. That's also another motivation to move to cutlass soon and have a more solid implementation to work independent of other libraries. On the other hand, Triton gives the flexibility to run on various hardwares. So, it is always a tradeoff. I think we need to have some more discussions on such dependencies later in a different discussion.
Best,
Reza

@jeffra jeffra mentioned this pull request Jul 23, 2024
@tjruwase tjruwase added this pull request to the merge queue Jul 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 26, 2024
@tjruwase tjruwase added this pull request to the merge queue Jul 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 26, 2024
@loadams loadams added this pull request to the merge queue Jul 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 29, 2024
@loadams loadams enabled auto-merge July 29, 2024 15:55
@loadams loadams added this pull request to the merge queue Jul 29, 2024
@loadams loadams disabled auto-merge July 29, 2024 15:57
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 29, 2024
@loadams loadams added this pull request to the merge queue Jul 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 29, 2024
@loadams loadams merged commit 4f95067 into microsoft:master Jul 29, 2024
11 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Aug 14, 2024
This is a refresh of of `OptimizedLinear` with the following features to
improve performance and usability:
 * More efficient sharing of base weights using `all_gather_into_tensor`
 * Flattened sharded weights
 * Selectively offload frozen weights to cpu
* `deepspeed.linear.Init` that allows injecting OptimizedLinear during
model construction (similar to zero.Init)
* Support for load state dict directly in OptimizedLinear, this allows
loading HF model weights correctly into sharded params
 * Various bug fixes for the LoRA implementation introduced previously
 * Several new unit tests
 
Builds on-top of @RezaYazdaniAminabadi's previous FP8 updates (#5764) to
support dense model fp8 quantization.

Example usage of this to fine-tune llama-3.1-405B on a single node:
https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/training/llama3.1

---------

Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com>
Co-authored-by: Reza Yazdani <152926435+sfc-gh-reyazda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants