Skip to content

Conversation

awni
Copy link
Member

@awni awni commented Oct 13, 2025

Uses a seemingly better memory policy for scalars. Helps a lot on B200 and H100:

mlx_lm.benchmark --model mlx-community/Meta-Llama-3.1-8B-Instruct-bf16 -p 32 -g 128
Device Pre tok/s Post tok/s
B200 195.95 229.40
H100 142.38 162.67

Training Qwen3 0.6B:

Device Pre tok/s Post tok/s
B200 61944 63942
H100 40826 41698

@awni awni requested review from angeloskath and zcbenz October 13, 2025 15:06
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@awni awni merged commit 25e2356 into ml-explore:main Oct 13, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants