Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TPU][Quantization] TPU W8A8 #11785

Merged
merged 73 commits into from
Jan 8, 2025
Merged
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
3b0c8a6
w8a8 working
robertgshaw2-redhat Oct 11, 2024
36fc1db
format
robertgshaw2-redhat Oct 11, 2024
d83c04c
added all kernels
robertgshaw2-redhat Oct 11, 2024
af9d0f4
format
robertgshaw2-redhat Oct 11, 2024
0f9fd21
working on cuda
robertgshaw2-redhat Oct 12, 2024
7b3203f
added mixed precision directory
robertgshaw2-redhat Oct 12, 2024
bf50fa4
formatting
robertgshaw2-redhat Oct 12, 2024
226ef52
cache current state - w8a16 running oom
robertgshaw2-redhat Oct 12, 2024
bb7c741
[TPU] Ensure torch._sync(param) is called after param.data.copy_()
WoosukKwon Oct 16, 2024
cf842bd
yapf
WoosukKwon Oct 17, 2024
67039bc
[TPU] Correctly profile peak memory usage
WoosukKwon Oct 17, 2024
0695f77
Upgrade PyTorch XLA
WoosukKwon Oct 17, 2024
11cf82f
Merge branch 'main' into tpu-peak-mem
WoosukKwon Oct 17, 2024
e016e38
stash
robertgshaw2-redhat Oct 20, 2024
717b859
Merge branch 'main' into compressed-tensors-tpu
robertgshaw2-redhat Oct 20, 2024
c848735
proper merge
robertgshaw2-redhat Oct 20, 2024
1539915
add mixed precision
robertgshaw2-redhat Oct 20, 2024
f00412a
format
robertgshaw2-redhat Oct 20, 2024
b0a6b70
stash
robertgshaw2-redhat Oct 20, 2024
e812d7e
Merge branch 'tpu-peak-mem' into compressed-tensors-tpu
robertgshaw2-redhat Oct 20, 2024
764dda1
stash
robertgshaw2-redhat Oct 20, 2024
87b2ae6
remove name
robertgshaw2-redhat Oct 20, 2024
e813ff8
revert woosuk change
robertgshaw2-redhat Oct 20, 2024
8cfaa1b
format
robertgshaw2-redhat Oct 20, 2024
bbc9741
update
robertgshaw2-redhat Oct 21, 2024
eb3f39e
fix nit
robertgshaw2-redhat Oct 21, 2024
bb2fbe1
update
robertgshaw2-redhat Oct 21, 2024
14ccb90
fix spurious
robertgshaw2-redhat Oct 21, 2024
4092be2
stash branch for brittany
robertgshaw2-redhat Oct 23, 2024
1aaa628
Merge branch 'main' into tpu-w8a8
robertgshaw2-redhat Jan 6, 2025
48aa54b
revert
robertgshaw2-redhat Jan 7, 2025
4efe915
fix
robertgshaw2-redhat Jan 7, 2025
e98b79c
updated
robertgshaw2-redhat Jan 7, 2025
5a89668
reduce cruft
robertgshaw2-redhat Jan 7, 2025
57cbf5c
reduce cruft
robertgshaw2-redhat Jan 7, 2025
3451c4d
updated
robertgshaw2-redhat Jan 7, 2025
0c2e62a
update comment
robertgshaw2-redhat Jan 7, 2025
172c9ca
revert spurious change
robertgshaw2-redhat Jan 7, 2025
938ca81
remove cruft
robertgshaw2-redhat Jan 7, 2025
9e18911
cruft reduction
robertgshaw2-redhat Jan 7, 2025
5f58ec7
update docs
robertgshaw2-redhat Jan 7, 2025
af9f298
added integration test
robertgshaw2-redhat Jan 7, 2025
6fe2f62
updated
robertgshaw2-redhat Jan 7, 2025
f2c0beb
Add bias back
robertgshaw2-redhat Jan 7, 2025
8b29718
add bias support
robertgshaw2-redhat Jan 7, 2025
1e2a373
updated
robertgshaw2-redhat Jan 7, 2025
2a359ef
stash
robertgshaw2-redhat Jan 7, 2025
f7e8975
Merge branch 'main' into remove-async-stream
robertgshaw2-redhat Jan 7, 2025
0d4c3fd
fix
robertgshaw2-redhat Jan 7, 2025
57340d2
update
robertgshaw2-redhat Jan 7, 2025
38291d5
trigger test in CI
robertgshaw2-redhat Jan 7, 2025
ead1e94
fix AZP
robertgshaw2-redhat Jan 7, 2025
cea5e54
fixed!
robertgshaw2-redhat Jan 7, 2025
940ddde
Merge branch 'tpu-w8a8' of https://github.com/neuralmagic/vllm into t…
robertgshaw2-redhat Jan 7, 2025
84a5b29
fix azp adju
robertgshaw2-redhat Jan 7, 2025
a1d7b4a
make docker command look better on gh
robertgshaw2-redhat Jan 7, 2025
2b4ecfd
remove torch warnings
robertgshaw2-redhat Jan 7, 2025
186c108
stash
robertgshaw2-redhat Jan 7, 2025
7e8598a
Merge branch 'tpu-w8a8' of https://github.com/neuralmagic/vllm into t…
robertgshaw2-redhat Jan 7, 2025
de773cd
fix AZP
robertgshaw2-redhat Jan 7, 2025
3a53d7d
merged
robertgshaw2-redhat Jan 7, 2025
0be5f69
added
robertgshaw2-redhat Jan 7, 2025
cb69ba7
fix formatting
robertgshaw2-redhat Jan 7, 2025
3896f6c
remove comment
robertgshaw2-redhat Jan 7, 2025
33e1e13
formatted
robertgshaw2-redhat Jan 7, 2025
dde72d6
add llama to ci
robertgshaw2-redhat Jan 7, 2025
d7a9c93
Merge branch 'main' into tpu-w8a8
robertgshaw2-redhat Jan 7, 2025
db9f795
Update supported_hardware.md
robertgshaw2-redhat Jan 7, 2025
09ad869
Update supported_hardware.md
robertgshaw2-redhat Jan 7, 2025
b74c88a
ixed docs build
robertgshaw2-redhat Jan 8, 2025
da4369e
Merge branch 'tpu-w8a8' of https://github.com/neuralmagic/vllm into t…
robertgshaw2-redhat Jan 8, 2025
5ddcac2
Merge branch 'main' into tpu-w8a8
robertgshaw2-redhat Jan 8, 2025
f353c43
fix CI
robertgshaw2-redhat Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix spurious
  • Loading branch information
robertgshaw2-redhat committed Oct 21, 2024
commit 14ccb90bdefaeb555a549e26b50d9ccdca5d4287
2 changes: 1 addition & 1 deletion vllm/model_executor/model_loader/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,6 @@ def load_model(self, *, model_config: ModelConfig,
# parameters onto device for processing and back off after.
with device_loading_context(module, target_device):
quant_method.process_weights_after_loading(module)

return model.eval()


Expand Down Expand Up @@ -1147,6 +1146,7 @@ def load_model(self, *, model_config: ModelConfig,
lora_config, cache_config)

self._load_weights(model_config, model)

return model.eval()


Expand Down
Loading