Efficient ResNet50 Inference on Apple M1 Pro (macOS)

Assignment Overview

Develop an efficient inference runtime for a desktop device, choosing:

Device: Macbook with M1 pro processor (an edge device with CPU/GPU/NPU units)
Engine(s): PyTorch, Core ML, ONNX Runtime (Core ML recommended)
Optimization function: Minimize latency and memory usage while preserving model accuracy
Approach: Baseline measurements + model/runtime modifications + benchmarking

I implemented a full pipeline: PyTorch → ONNX → Core ML conversions (including custom pass pipelines modification), and measured accuracy, inference time, and memory usage across variants.

Baseline Model

Model Used: ResNet50 (ImageNet-pretrained, from torchvision.models)
Method: PyTorch (MPS GPU)
Script: base.py

Implemented Optimizations

A. CoreML Optimized Versions

Script: prepare.py, coreml_optimized.py
Conversion from PyTorch: via traced TorchScript model
Variants using quantization and customizing the pipeline:
- CoreML FP16 (default pipeline)
- CoreML FP32 (default pipeline)
- CoreML FP16 (custom pipeline)
- CoreML FP32 (custom pipeline)
Custom Pipeline Passes Removed:

pipeline.remove_passes({
  "common::merge_consecutive_transposes",
  "common::cast_optimization", # only for precision float 32
  "common::add_int16_cast"
})

Rationale: These were deemed unnecessary for a standard CNN like ResNet50.

B. ONNX Optimized Versions

Script: onnx_optimized.py
Conversion from PyTorch: via torch.onnx.export
Variants:
- ONNX FP16 (using CoreMLExecutionProvider)
- ONNX FP32 (using CoreMLExecutionProvider)

Benchmark Methodology

Metrics Recorded:
- Inference Latency (ms)
- Accuracy (%) using a real dataset (subset of Imagenette)
- Memory Usage (MB), measured via psutil
Batch Size: 1 (single image)
Runs: 100 inferences

Results Summary

Engine / Variant	Accuracy	Avg Latency (ms)	Memory Usage (MB)
PyTorch (MPS)	98%	28	30-43
CoreML FP16 (default)	98%	`7-8`	27-29
CoreML FP16 (custom)	98%	`7-8`	27-29
CoreML FP32 (default)	98%	12-14	27.5-33
CoreML FP32 (custom)	98%	12-14	27-31
ONNX FP16 (CoreML backend)	98%	78-80	35-50
ONNX FP32 (CoreML backend)	98%	11-13	`10-14`

Analysis and Takeaways

Best Performing Variant: CoreML FP16 (default pipeline) offered the lowest latency with minimal memory use (~40% latency improvement).
Custom Pipeline: Removing passes had no positive effect on runtime/memory, possibly reduced compile time (not measured).
ONNX with CoreML backend: Performed well with FP32 and achieved the lowest memory usage but suffered latency when using FP16 due to precision conversion overhead.
Memory Tradeoffs: PyTorch MPS had higher variance in memory usage; ONNX FP16 had overhead due to conversion steps.

Final Recommendation

For macOS M1 pro edge devices, CoreML with:

Precision: FLOAT16
Engine: ML Program format
Compute Units: ALL (to use ANE/GPU/CPU)

provides the best balance of speed and efficiency for ResNet50.

Files Included

base.py - PyTorch baseline benchmark
prepare.py - Export to ONNX and CoreML
coreml_optimized.py - CoreML model evaluation
onnx_optimized.py - ONNX model evaluation (using CoreML backend)
README.md - Overview

Further Optimized

Explore the operations and fuse the missing opportunities
Simplifying the graph by
- Constant folding
- Dead code elimination
- Removing redundant operations
Model-level optimizations
- Pruning
Profiling and detecting bottlenecks
ONNX engine seems to be more open and flexible to optimizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient ResNet50 Inference on Apple M1 Pro (macOS)

Assignment Overview

Baseline Model

Implemented Optimizations

A. CoreML Optimized Versions

B. ONNX Optimized Versions

Benchmark Methodology

Results Summary

Analysis and Takeaways

Final Recommendation

Files Included

Further Optimized

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
base.py		base.py
coreml_optimized.py		coreml_optimized.py
onnx_optimized.py		onnx_optimized.py
prepare.py		prepare.py

Folders and files

Latest commit

History

Repository files navigation

Efficient ResNet50 Inference on Apple M1 Pro (macOS)

Assignment Overview

Baseline Model

Implemented Optimizations

A. CoreML Optimized Versions

B. ONNX Optimized Versions

Benchmark Methodology

Results Summary

Analysis and Takeaways

Final Recommendation

Files Included

Further Optimized

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages