Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bfloat16 by default on MPS #95

Merged
merged 2 commits into from
Aug 21, 2024
Merged

Use bfloat16 by default on MPS #95

merged 2 commits into from
Aug 21, 2024

Conversation

juberti
Copy link
Contributor

@juberti juberti commented Aug 21, 2024

Much faster than float32 and just as fast as float16, on both model loading and generation, thanks to support in macOS 14.

% time just infer -v --text_only --prompt count-to-10
poetry run python -m ultravox.tools.infer_tool -v --text_only --prompt count-to-10
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:14<00:00,  3.61s/it]
Q: count-to-10
A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. [ttft: 3.56 s, tok: 10, tps: 2.86, tot: 7.06 s]
---
% time just infer -v --text_only --prompt count-to-10 --data_type float16
poetry run python -m ultravox.tools.infer_tool -v --text_only --prompt count-to-10 --data_type float16
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.54s/it]
Q: count-to-10
A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. [ttft: 2.31 s, tok: 10, tps: 4.32, tot: 4.62 s]
just infer -v --text_only --prompt count-to-10 --data_type float16  18.74s user 11.23s system 142% cpu 21.049 total
---
% time just infer -v --text_only --prompt count-to-10 --data_type bfloat16 
poetry run python -m ultravox.tools.infer_tool -v --text_only --prompt count-to-10 --data_type bfloat16
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.11it/s]
Q: count-to-10
A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. [ttft: 2.45 s, tok: 10, tps: 3.62, tot: 5.22 s]
just infer -v --text_only --prompt count-to-10 --data_type bfloat16  9.83s user 5.27s system 79% cpu 19.009 total

ultravox/inference/utils.py Show resolved Hide resolved
@juberti juberti merged commit f2fbdaf into main Aug 21, 2024
1 check passed
@juberti juberti mentioned this pull request Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants