Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test util for basic tensor subclass functionalities #839

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

jerryzh168
Copy link
Contributor

Summary:
This is a small starting point for testing low precision tensor subclass functionalities we can add more test cases for training, tensor parallel, FSDP in the future

right now it tests:

  • tensor flatten/unflatten
  • constructing low precision tensor with different device/dtype
  • move tensor subclass from device1 to device2
  • transpose works
  • linear works (weight only quantization with the low precision tensor)

It can be extended with new tensor subclasses or test cases by overriding the class variables: e.g.

class MyTensorSubclassTest(TorchAOBasicTestCase):
    TENSOR_SUBCLASS = LUTQuantizedTensor
    FACTORY_FN = to_lut_quantized_intx
    kwargs = {
        "target_dtype": torch.uint8,
    }
    # minimum sqnr for linear operation when the weight is quantized to low precision
    # with the above setting
    LINEAR_MIN_SQNR = 40

Test Plan:
python test/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:
This is a small starting point for testing low precision tensor subclass functionalities
we can add more test cases for training, tensor parallel, FSDP in the future

right now it tests:
- tensor flatten/unflatten
- constructing low precision tensor with different device/dtype
- move tensor subclass from device1 to device2
- transpose works
- linear works (weight only quantization with the low precision tensor)

It can be extended with new tensor subclasses or test cases by overriding the class variables:
e.g.
```
class MyTensorSubclassTest(TorchAOBasicTestCase):
    COMMON_DEVICES = ["cpu", "cuda"]
    COMMON_DTYPES = [torch.float32, torch.float16, torch.bfloat16]

    TENSOR_SUBCLASS = LUTQuantizedTensor
    FACTORY_FN = to_lut_quantized_intx
    kwargs = {
        "target_dtype": torch.uint8,
    }
    # minimum sqnr for linear operation when the weight is quantized to low precision
    # with the above setting
    LINEAR_MIN_SQNR = 40
```

Test Plan:
python test/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link

pytorch-bot bot commented Sep 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/839

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 75d9c90 with merge base e05635e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 7, 2024
tensor_data_dict = {name: getattr(lp_tensor, name) for name in tensor_data_name_dict}
outer_size = lp_tensor.size()
outer_stride = lp_tensor.stride()
reconstructed = self.TENSOR_SUBCLASS.__tensor_unflatten__(tensor_data_dict, tensor_attributes, outer_size, outer_stride)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one kind of annoying thing that makes __tensor_flatten__ and friends difficult to test in isolation is that we want their implementations to be side-effect-free / idempotent. This is kind of difficult to test by calling the fn directly, although usually you will find out if your implementation is wrong pretty quickly because usages of compile with your subclass will break.

This test suite seems like a good start (if the goal is "a test suite for all AO-specific quantization subclasses that conform to a specific API"). Do you want / plan to add more compile-related tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah I can add more compile related tests, although I'm not exactly sure what are all the scenarios I should be testing, I can also start with simple cases like just compile the quantized weights and run on linear op as well, but I'd like to hear what's your thoughts on rough categories of compile related tests we can add as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few that come to mind are:

(1) simple test where inputs + outputs of the function f are your subclass and you do some basic compute (return inp + inp)
(2) subclass constructor in the graph (e.g. inputs to f are plain tensors, output is a subclass)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, I can add these in a separate PR

Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine!

@jerryzh168 jerryzh168 merged commit 00d18ec into pytorch:main Sep 9, 2024
17 checks passed
@jerryzh168 jerryzh168 deleted the test_utils branch September 9, 2024 23:27
jainapurva pushed a commit that referenced this pull request Sep 10, 2024
* Add test util for basic tensor subclass functionalities

Summary:
This is a small starting point for testing low precision tensor subclass functionalities
we can add more test cases for training, tensor parallel, FSDP in the future

right now it tests:
- tensor flatten/unflatten
- constructing low precision tensor with different device/dtype
- move tensor subclass from device1 to device2
- transpose works
- linear works (weight only quantization with the low precision tensor)

It can be extended with new tensor subclasses or test cases by overriding the class variables:
e.g.
```
class MyTensorSubclassTest(TorchAOBasicTestCase):
    COMMON_DEVICES = ["cpu", "cuda"]
    COMMON_DTYPES = [torch.float32, torch.float16, torch.bfloat16]

    TENSOR_SUBCLASS = LUTQuantizedTensor
    FACTORY_FN = to_lut_quantized_intx
    kwargs = {
        "target_dtype": torch.uint8,
    }
    # minimum sqnr for linear operation when the weight is quantized to low precision
    # with the above setting
    LINEAR_MIN_SQNR = 40
```

Test Plan:
python test/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

* minor fix

* don't use inductor TestCase
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Sep 19, 2024
Summary:
This is a follow up PR addressing pytorch#839 (comment)
We can add more compiler related tests in the future.

Next
* refactor a bit to use quantize_ API directly
* use the test suite in existing API tests

Test Plan:
python torchao/testing/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit that referenced this pull request Sep 26, 2024
* Add compile tests to test suite

Summary:
This is a follow up PR addressing #839 (comment)
We can add more compiler related tests in the future.

Next
* refactor a bit to use quantize_ API directly
* use the test suite in existing API tests

Test Plan:
python torchao/testing/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

* rename

* add result check
weifengpy pushed a commit to weifengpy/ao that referenced this pull request Sep 26, 2024
* Add compile tests to test suite

Summary:
This is a follow up PR addressing pytorch#839 (comment)
We can add more compiler related tests in the future.

Next
* refactor a bit to use quantize_ API directly
* use the test suite in existing API tests

Test Plan:
python torchao/testing/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

* rename

* add result check
weifengpy added a commit that referenced this pull request Oct 1, 2024
…th torch.compile (#904)

* [float8] improve eager numerics for dynamic scales

* leave torch.linalg.vector_norm for another PR

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* cuda

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove _data and investigate

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove _data comment

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* upcast to float32 is enough

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* explain why float32

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* _data parity

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* handle sm8.9

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix transformer unit test

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* print if error

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add tutorial for trainable tensor subclass (#908)

Summary: The new tutorial provides an example of how to implement
a trainable tensor subclass that wraps quantized data. This extends
the existing `MyDTypeTensor` with a few necessary steps to ensure
proper gradient updates, namely:

1. Define a differentiable constructor
2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear)
3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_)

Test Plan:
python tutorials/developer_api_guide/my_trainable_tensor_subclass.py

* Introducing 1-bit quantization for Llama in torchchat (#910)

Differential Revision: D63052325

Pull Request resolved: #911

* Rename Floating point to fp8 (#909)

* [float8] fix typo in bitwise_identical unit test (#918)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Adding example for quantized tensor + tensor parallelism (#785)

* [WIP] Adding example for quantized tensor + tensor parallelism

Summary:
This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md

End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation

Test Plan:
torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py

Reviewers:

Subscribers:

Tasks:

Tags:

* tensor parallel file

* Use DTensor.from instead of distribute_tensor

* implementing aten.slice.Tensor (WIP)

* working

* some shape fix and use more quant primitive ops

* Add rowwise test

* make rowwise sharding work

* compile still not working yet

* fake tensor didn't pick up shape changes from transpose

* backend='eager'

* change transpose to non-inplace op

* add error message

* works now with torch nightly

* remove print

* ruff

* Clean up

* Fix device id

---------

Co-authored-by: Ke Wen <kw2501@meta.com>

* rename cuda mode -> gpu mode (#925)

* Add workaround to recover the perf for quantized vit in torch.compile (#926)

Add temporary workaround to recover the perf for quantized vit under torch.compile

Summary:
Recently we found a perf drop in quantized vit due to #898 (comment)
This PR add a temp fix until we figure out the longer term fix.

I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that

Test Plan:
python tutorials/quantize_vit/run_vit_b_quant.py

Reviewers:

Subscribers:

Tasks:

Tags:

* clean up device checks in float8 unit test files (#923)

Summary:

While working on rowwise scaling I noticed that some of the CUDA
device capability checks we had in the test files did not make sense,
cleaning this up.

Test Plan:

tests pass on my H100

CI, it should skip less tests now since CI only has CUDA capability 8, 9

Reviewers:

Subscribers:

Tasks:

Tags:

* [low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 256 to match new bnb v0.44 (#927)

* Float8 autoquant weight only (#866)

* Fix failing FP6 benchmark (#931)

* Remove two if statements in fp8 padding (#935)

Reviewed By: vkuzo

Differential Revision: D63051205

Pull Request resolved: #935
Approved by: https://github.com/vkuzo

* [Distributed] Improve sharding example (#937)

* [Distributed] Improve sharding example

* Add comment

* Add composable QAT quantizer (#938)

Summary: This is a utility for users who wish to apply multiple
QAT quantizers to their models. In the near future, we expect
to add an embedding QAT quantizer that composes with the
existing linear QAT quantizers.

Test Plan:
python test/quantization/test_qat.py -k test_composable_qat_quantizer

* resolve conflict with latest main

Differential Revision: D63048850

Pull Request resolved: #912

* Add torchchat quantizer

Differential Revision: D62394341

Pull Request resolved: #897

* Add compile tests to test suite (#906)

* Add compile tests to test suite

Summary:
This is a follow up PR addressing #839 (comment)
We can add more compiler related tests in the future.

Next
* refactor a bit to use quantize_ API directly
* use the test suite in existing API tests

Test Plan:
python torchao/testing/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

* rename

* add result check

* Fix up CMakeLists and reorganize some code locations

Differential Revision: D62711903

Pull Request resolved: #948

* [float8] all-reduce amax on dp mesh instead of global pg (#933)

* [float8] all-reduce amax on dp mesh instead of global pg

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* liner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* improve comments

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* move hp tensor inside if

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* int8 dynamic quant + bsr support (#821)

This PR, adds in int8 dynamicquant + bsr support.

Changes:
* Use i8i8 -> bf16 matmul to maintain accuracy
* Added a block sparse layout type to AffineQuantizedTensor + check/impl.  
* Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers
* Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers
* Lots of lint formatting and README updates
* torch.compile now working and is correct

* fixing some issues with our support for 70/405B models (#941)

Summary: download and convert scripts needed to be updated alongside
model.py config files

Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth

Reviewers:

Subscribers:

Tasks:

Tags:

* Update INT8 mixed-precision training test to be less flaky (#950)

* Add executorch parallel

Differential Revision: D62711909

Pull Request resolved: #953

* test CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* better comment on why upcasting

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* control seed

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* move unit test to test_compile

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix typo

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* float64 upcasting after allreduce

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* use LinearMMConfig

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

---------

Co-authored-by: andrewor14 <andrewor14@gmail.com>
Co-authored-by: Vaishnavi Gupta <vaishnavi10367@gmail.com>
Co-authored-by: Apurva Jain <apurvajain.kota@gmail.com>
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com>
Co-authored-by: Ke Wen <kw2501@meta.com>
Co-authored-by: Mark Saroufim <marksaroufim@meta.com>
Co-authored-by: Vasiliy Kuznetsov <vkuzo@users.noreply.github.com>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: Tobias van der Werff <33268192+tobiasvanderwerff@users.noreply.github.com>
Co-authored-by: Shuqi Yang <shuqiyang@meta.com>
Co-authored-by: Scott Roy <161522778+metascroy@users.noreply.github.com>
Co-authored-by: Jesse Cai <jessecai@meta.com>
Co-authored-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 3, 2024
* Add compile tests to test suite

Summary:
This is a follow up PR addressing pytorch#839 (comment)
We can add more compiler related tests in the future.

Next
* refactor a bit to use quantize_ API directly
* use the test suite in existing API tests

Test Plan:
python torchao/testing/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

* rename

* add result check
melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 7, 2024
…th torch.compile (pytorch#904)

* [float8] improve eager numerics for dynamic scales

* leave torch.linalg.vector_norm for another PR

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* cuda

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove _data and investigate

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove _data comment

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* upcast to float32 is enough

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* explain why float32

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* _data parity

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* handle sm8.9

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix transformer unit test

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* print if error

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add tutorial for trainable tensor subclass (pytorch#908)

Summary: The new tutorial provides an example of how to implement
a trainable tensor subclass that wraps quantized data. This extends
the existing `MyDTypeTensor` with a few necessary steps to ensure
proper gradient updates, namely:

1. Define a differentiable constructor
2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear)
3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_)

Test Plan:
python tutorials/developer_api_guide/my_trainable_tensor_subclass.py

* Introducing 1-bit quantization for Llama in torchchat (pytorch#910)

Differential Revision: D63052325

Pull Request resolved: pytorch#911

* Rename Floating point to fp8 (pytorch#909)

* [float8] fix typo in bitwise_identical unit test (pytorch#918)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Adding example for quantized tensor + tensor parallelism (pytorch#785)

* [WIP] Adding example for quantized tensor + tensor parallelism

Summary:
This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md

End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation

Test Plan:
torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py

Reviewers:

Subscribers:

Tasks:

Tags:

* tensor parallel file

* Use DTensor.from instead of distribute_tensor

* implementing aten.slice.Tensor (WIP)

* working

* some shape fix and use more quant primitive ops

* Add rowwise test

* make rowwise sharding work

* compile still not working yet

* fake tensor didn't pick up shape changes from transpose

* backend='eager'

* change transpose to non-inplace op

* add error message

* works now with torch nightly

* remove print

* ruff

* Clean up

* Fix device id

---------

Co-authored-by: Ke Wen <kw2501@meta.com>

* rename cuda mode -> gpu mode (pytorch#925)

* Add workaround to recover the perf for quantized vit in torch.compile (pytorch#926)

Add temporary workaround to recover the perf for quantized vit under torch.compile

Summary:
Recently we found a perf drop in quantized vit due to pytorch#898 (comment)
This PR add a temp fix until we figure out the longer term fix.

I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that

Test Plan:
python tutorials/quantize_vit/run_vit_b_quant.py

Reviewers:

Subscribers:

Tasks:

Tags:

* clean up device checks in float8 unit test files (pytorch#923)

Summary:

While working on rowwise scaling I noticed that some of the CUDA
device capability checks we had in the test files did not make sense,
cleaning this up.

Test Plan:

tests pass on my H100

CI, it should skip less tests now since CI only has CUDA capability 8, 9

Reviewers:

Subscribers:

Tasks:

Tags:

* [low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 256 to match new bnb v0.44 (pytorch#927)

* Float8 autoquant weight only (pytorch#866)

* Fix failing FP6 benchmark (pytorch#931)

* Remove two if statements in fp8 padding (pytorch#935)

Reviewed By: vkuzo

Differential Revision: D63051205

Pull Request resolved: pytorch#935
Approved by: https://github.com/vkuzo

* [Distributed] Improve sharding example (pytorch#937)

* [Distributed] Improve sharding example

* Add comment

* Add composable QAT quantizer (pytorch#938)

Summary: This is a utility for users who wish to apply multiple
QAT quantizers to their models. In the near future, we expect
to add an embedding QAT quantizer that composes with the
existing linear QAT quantizers.

Test Plan:
python test/quantization/test_qat.py -k test_composable_qat_quantizer

* resolve conflict with latest main

Differential Revision: D63048850

Pull Request resolved: pytorch#912

* Add torchchat quantizer

Differential Revision: D62394341

Pull Request resolved: pytorch#897

* Add compile tests to test suite (pytorch#906)

* Add compile tests to test suite

Summary:
This is a follow up PR addressing pytorch#839 (comment)
We can add more compiler related tests in the future.

Next
* refactor a bit to use quantize_ API directly
* use the test suite in existing API tests

Test Plan:
python torchao/testing/utils.py

Reviewers:

Subscribers:

Tasks:

Tags:

* rename

* add result check

* Fix up CMakeLists and reorganize some code locations

Differential Revision: D62711903

Pull Request resolved: pytorch#948

* [float8] all-reduce amax on dp mesh instead of global pg (pytorch#933)

* [float8] all-reduce amax on dp mesh instead of global pg

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* liner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* improve comments

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* move hp tensor inside if

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* linter

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* int8 dynamic quant + bsr support (pytorch#821)

This PR, adds in int8 dynamicquant + bsr support.

Changes:
* Use i8i8 -> bf16 matmul to maintain accuracy
* Added a block sparse layout type to AffineQuantizedTensor + check/impl.  
* Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers
* Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers
* Lots of lint formatting and README updates
* torch.compile now working and is correct

* fixing some issues with our support for 70/405B models (pytorch#941)

Summary: download and convert scripts needed to be updated alongside
model.py config files

Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth

Reviewers:

Subscribers:

Tasks:

Tags:

* Update INT8 mixed-precision training test to be less flaky (pytorch#950)

* Add executorch parallel

Differential Revision: D62711909

Pull Request resolved: pytorch#953

* test CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* better comment on why upcasting

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* control seed

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* move unit test to test_compile

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix typo

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* float64 upcasting after allreduce

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* use LinearMMConfig

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

---------

Co-authored-by: andrewor14 <andrewor14@gmail.com>
Co-authored-by: Vaishnavi Gupta <vaishnavi10367@gmail.com>
Co-authored-by: Apurva Jain <apurvajain.kota@gmail.com>
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com>
Co-authored-by: Ke Wen <kw2501@meta.com>
Co-authored-by: Mark Saroufim <marksaroufim@meta.com>
Co-authored-by: Vasiliy Kuznetsov <vkuzo@users.noreply.github.com>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: Tobias van der Werff <33268192+tobiasvanderwerff@users.noreply.github.com>
Co-authored-by: Shuqi Yang <shuqiyang@meta.com>
Co-authored-by: Scott Roy <161522778+metascroy@users.noreply.github.com>
Co-authored-by: Jesse Cai <jessecai@meta.com>
Co-authored-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants