Speed up scalars part 2 #2669

awni · 2025-10-13T15:06:25Z

Uses a seemingly better memory policy for scalars. Helps a lot on B200 and H100:

mlx_lm.benchmark --model mlx-community/Meta-Llama-3.1-8B-Instruct-bf16 -p 32 -g 128

Device	Pre tok/s	Post tok/s
B200	195.95	229.40
H100	142.38	162.67

Training Qwen3 0.6B:

Device	Pre tok/s	Post tok/s
B200	61944	63942
H100	40826	41698

angeloskath

Nice.

speed up scalars

5e66b95

awni requested review from angeloskath and zcbenz October 13, 2025 15:06

angeloskath approved these changes Oct 13, 2025

View reviewed changes

awni merged commit 25e2356 into ml-explore:main Oct 13, 2025
7 checks passed

Provide feedback