Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] [Core] feat: Add model loading using tensorizer #3476

Merged
merged 102 commits into from
Apr 14, 2024
Merged
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
dfe2f2f
feat: Support loading model tensors using `tensorizer`
sangstar Feb 1, 2024
097f297
fix: Remove unnecessary files
sangstar Feb 2, 2024
24e8657
fix(vllm-tensorizer): Allow providing S3 credentials
sangstar Feb 6, 2024
6192ff3
fix: Fix passing S3 auth vars through stream
sangstar Feb 7, 2024
fbc847b
fix: Disallowing `plaid_mode = False` and updating `tensorizer` version
sangstar Feb 13, 2024
f4d57d8
refactor: Retire use of `download_dir` as `TensorizerArgs` param
sangstar Feb 13, 2024
cf42149
fix: Remove `store_true` action for `--tensorizer-uri`
sangstar Feb 13, 2024
c1839f4
refactor: No 2x copying for `tensorizer` (WIP)
sangstar Feb 28, 2024
b28b26e
chore: Omit commandeering weight loaders for merging layers (WIP)
sangstar Feb 29, 2024
fad72a4
feat: Re-add deserializing vLLM models
sangstar Mar 1, 2024
8d421b4
chore: Harmonize CPU and GPU deserializing
sangstar Mar 1, 2024
8225c32
perf: Add `force_http=True` for faster loading speeds
sangstar Mar 5, 2024
f7c9cc7
chore: Reformat code with `format.sh`, cleanup debugging code
sangstar Mar 8, 2024
44b05ba
chore: Fix formatting, some misc. changes
sangstar Mar 11, 2024
17977b0
fix: Correct logging for loading tensorizer with cpu
sangstar Mar 11, 2024
68f2a51
chore: Implement changes from feedback
sangstar Mar 11, 2024
0c72c2c
fix: Correctly instantiate vLLM-formatted models
sangstar Mar 12, 2024
af10594
chore: Reformat and delete deprecated comment from `.ipynb`
sangstar Mar 12, 2024
550983a
perf: Allow passing of deserializer args from `TensorizerArgs`
sangstar Mar 12, 2024
6273266
style: Reformat with new formatting changes
sangstar Mar 12, 2024
f6a695b
Run yapf and ruff
sangstar Mar 12, 2024
f30f4e0
fix: Fix incorrect `TensorizerArgs` import in `config.py`
sangstar Mar 12, 2024
c539880
perf: Multiple misc. improvements from code review
sangstar Mar 14, 2024
1632381
pref: More misc. fixes to complete initial code review
sangstar Mar 15, 2024
4085cb5
fix: Remove `print(tensorizer_args)`
sangstar Mar 15, 2024
81a752a
Run yapf and ruff
sangstar Mar 15, 2024
aa8d8b4
fix: Add specific category for warnings with `PerformanceWarning`
sangstar Mar 15, 2024
d8e71df
chore: Multiple fixes from final code review
sangstar Mar 18, 2024
5132dd7
fix: Add `s3_endpoint` as attr for `TensorizerArgs`
sangstar Mar 18, 2024
7dd43f5
chore: Remove `filter_func` from CLI args, some doc fixes
sangstar Mar 18, 2024
ad68ff5
chore: Allow env var or CLI arg specification for S3 credentials
sangstar Mar 18, 2024
f965730
fix: Disallow using `force_http`
sangstar Mar 18, 2024
2605a33
chore: Remove unnecessary print statement in example script
sangstar Mar 18, 2024
71c2cb0
Run yapf and ruff
sangstar Mar 18, 2024
35b29e8
Run yapf and ruff
sangstar Mar 18, 2024
117feec
Run yapf and ruff
sangstar Mar 18, 2024
6192e9d
docs: Update `tensorizer` as a `--load-format` in `engine_args.rst`
sangstar Mar 18, 2024
407b32e
fix: Restore `tensorizer_args` as instance attr to `EngineArgs`
sangstar Mar 18, 2024
88e209d
Run yapf and ruff
sangstar Mar 18, 2024
6e23dcd
chore: Move testing out of own test folder
sangstar Mar 19, 2024
05c0bbe
fix: Add `tensorizer >= 2.8.1` to `requirements-rocm.txt` for CI
sangstar Mar 20, 2024
af11a53
fix: Add version of `tensorizer` that will pass testing suite
sangstar Mar 21, 2024
8ece4f8
chore: Add notice that `requirements-dev` dep can be removed `>2.8.1`
sangstar Mar 21, 2024
d4a46a5
fix: Resolve double `HfFileSystem` import
sangstar Mar 25, 2024
12b1f12
style: Run `isort`
sangstar Mar 25, 2024
445ab28
Run yapf and ruff
sangstar Apr 1, 2024
6c286ed
fix: Add `tensorizer` to mock imports
sangstar Apr 2, 2024
37348f9
perf: Add newest `tensorizer` version that will not init CUDA
sangstar Apr 3, 2024
82da7a5
fix: Adjust `tensorizer` version for `requirements-dev.txt`
sangstar Apr 3, 2024
310dd68
chore: Rebase and fix carrying over changes to `arg_utils` typing
sangstar Apr 3, 2024
cf56513
fix: Add `tensorizer` to `requirements-cpu.txt`
sangstar Apr 3, 2024
9c8db87
perf: Add concurrent reading to `TensorDeserializer`
sangstar Apr 3, 2024
8ca0cb1
docs: Add `num_readers` docstring
sangstar Apr 3, 2024
21bca06
chore: Replace `PerformanceWarning` after rebase
sangstar Apr 4, 2024
0c82446
Run yapf and ruff
sangstar Apr 4, 2024
06cd26d
fix: Fix model output on deserialization and add e2e output test
sangstar Apr 10, 2024
f19ee64
fix: Properly ensure test outputs are deterministic, add HF model test
sangstar Apr 10, 2024
f1f2e16
fix: Make vLLM tensorizing specification less hacky
sangstar Apr 10, 2024
71a9f79
docs: Add tensorizer link in `engine_args.rst`, docstring to example
sangstar Apr 10, 2024
9e5456a
chore: Resolve comments
sangstar Apr 10, 2024
74a8642
fix: Affirm mandatory `vllm_tensorized` argument change
sangstar Apr 10, 2024
f82b25a
perf: Allow preliminary support deserializing with LoRA adapters
sangstar Apr 10, 2024
3ec85e0
fix: Fix requirements.txt passing import tensorizer only if installed
sangstar Apr 10, 2024
dfb7a11
fix: Properly ensure import fail if tensorizer not used nor installed
sangstar Apr 10, 2024
81196ed
perf: Move test location and add testing for LoRA
sangstar Apr 10, 2024
5ecf4ee
perf: Add some testing changes, introduce `TensorizerConfig`
sangstar Apr 11, 2024
b267cbd
chore: Add `__init__.py` for `tests/tensorizer`
sangstar Apr 11, 2024
6bff0c7
tests: Fix `test_tensorizer.py` to account for new changes
sangstar Apr 11, 2024
e0b7184
tests: Remove `test_tensorizer_api_server.py`
sangstar Apr 11, 2024
3ec105d
Run yapf and ruff; fix tests
sangstar Apr 11, 2024
9d568fc
fix: Revert change to `examples/multilora_inference.py`
sangstar Apr 11, 2024
55d2a41
Merge remote-tracking branch 'upstream/main' into sangstar/integrate-…
sangstar Apr 11, 2024
65bc7bb
Merge remote-tracking branch 'upstream/main' into sangstar/integrate-…
sangstar Apr 11, 2024
b1b5653
perf: Update code to reflect change in #3977
sangstar Apr 11, 2024
5f27722
chore: Remove accidental syntax error
sangstar Apr 11, 2024
8240af9
docs: Elaborate on S3 credentialing
sangstar Apr 11, 2024
7f5eada
fix: Properly passing `tensorizer_config` to hf weight loader
sangstar Apr 12, 2024
e0d9cc7
fix: Fix, test tensorizer uri passing without tensorizer load format
sangstar Apr 12, 2024
de54538
docs: Note example script in docs for more information
sangstar Apr 12, 2024
1feab4e
chore: Run yapf and ruff, as well as doc edits
sangstar Apr 12, 2024
a9b0241
fix: Fix `initialize_model_parallel` import
sangstar Apr 12, 2024
a297a62
tests: Add test for `examples/tensorize_vllm_model.py`
sangstar Apr 12, 2024
2d07568
tests: Fix lora test
sangstar Apr 12, 2024
1bddfe6
Run yapf and ruff
sangstar Apr 12, 2024
852f0ad
fix: Move `tensorize_loader` imports to pass CPU test
sangstar Apr 12, 2024
aef7442
refactor: Pass `TensorizerArgs` direct to `EngineArgs.add_cli_args`
sangstar Apr 12, 2024
ff0a528
tests: Add api_server test using tensorizer
sangstar Apr 12, 2024
4551b84
fix: Add `tensorizer_config` to `RayGPUExecutor`
sangstar Apr 12, 2024
d51b0bc
tests: Formatting and add test to ensure `tensorizer` load format
sangstar Apr 12, 2024
64178e4
style: Run yapf on `examples/tensorize_vllm_model.py`
sangstar Apr 12, 2024
2f4dcb3
style: Run isort on `examples/tensorize_vllm_model.py`
sangstar Apr 12, 2024
3df1945
style: Fix yapf and isort conflict
sangstar Apr 12, 2024
eb925f0
fix: Remove `tensorizer_args` from `ModelConfig`
sangstar Apr 12, 2024
ba6927d
fix: Add error for device scattering and initial handling for quant
sangstar Apr 13, 2024
bd461cc
perf: Multiple changes in response to comments
sangstar Apr 13, 2024
ca2a3fb
perf: Final changes to resolve comments
sangstar Apr 13, 2024
428f53d
fix: Skip tests if cURL not installed, add example script for testing
sangstar Apr 13, 2024
88f1a67
Run yapf and ruff
sangstar Apr 13, 2024
d2491ac
tests: Install cURL for tensorizer tests for testing suite
sangstar Apr 13, 2024
d77215f
tests: Install libsodium23 for CI tensorizer tests
sangstar Apr 13, 2024
9de338c
fix: Fix testing import path
sangstar Apr 13, 2024
95251d7
Run yapf and ruff
sangstar Apr 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
chore: Reformat and delete deprecated comment from .ipynb
  • Loading branch information
sangstar committed Apr 4, 2024
commit af10594e10dd159ca023481b2f992b204c40e308
5 changes: 3 additions & 2 deletions examples/tensorize_vllm_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,14 @@ def serialize():
s3_access_key_id=s3_access_key_id,
s3_secret_access_key=s3_secret_access_key)
serializer = TensorSerializer(stream)

print(
f"Writing serialized tensors for model {MODEL_REF} to {S3_URI}. "
"Type given as {next(model.parameters()).dtype}")

serializer.write_module(model)
serializer.close()
print("Serialization complete. It is recommended you restart the kernel "
"if deserializing.")
print("Serialization complete.")


def deserialize():
Expand Down