docs: Add guide for profiling MLX models with Xcode Instruments #1395

lovelyoverflow · 2025-11-21T13:24:36Z

Motivation

While analyzing MLX performance on my Mac Studio (M4 Max), I realized that visualizing GPU execution patterns is critical for understanding optimization. Currently, there seems to be a lack of documentation on how to leverage Xcode Instruments with MLX.

Changes

Added a new guide: guides/profiling_with_instruments.md
Explained how to identify "CPU Dispatch Overhead" vs "Fused Kernels" using Metal System Trace.
Included a sample code snippet for profiling.

Context

I am a student aiming to become an inference optimization engineer. I found that MLX's kernel fusion drastically reduces memory bandwidth pressure compared to PyTorch on Unified Memory architectures. I hope this guide helps other developers optimize their models.

Thank you for your hard work on this amazing framework!

This guide demonstrates how to use Metal System Trace to identify CPU dispatch overhead vs fused kernels, helping developers optimize MLX models on Apple Silicon. Includes: - Step-by-step profiling workflow using xctrace CLI - Benchmark script demonstrating kernel fusion benefits - Visual comparison of eager vs compiled execution patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add guide for profiling MLX models with Xcode Instruments #1395

docs: Add guide for profiling MLX models with Xcode Instruments #1395

Uh oh!

lovelyoverflow commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

docs: Add guide for profiling MLX models with Xcode Instruments #1395

Are you sure you want to change the base?

docs: Add guide for profiling MLX models with Xcode Instruments #1395

Uh oh!

Conversation

lovelyoverflow commented Nov 21, 2025

Motivation

Changes

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant