Lots of quantised model variants are published to mlx-community on Hugging Face, but which is best for your Mac?
This python snippet grabs a selection of models and runs Arc-Easy to provide an estimate of how much the model has been degraded by quantisation. It also provides very rough timing numbers.
Uses https://github.com/astral-sh/uv for package management
uv syncSet the HF models you want to test and configure the number of questions by editing the constants in evaluate_models.py.
uv run evaluate_models.py