-
Notifications
You must be signed in to change notification settings - Fork 93
Espresso: production ANE inference framework + new MIL gotchas to contribute #44
Copy link
Copy link
Open
Description
Hi @hollance,
Your "Everything we actually know about the Apple Neural Engine" repo has been an invaluable reference — thank you for documenting all of that. It's been essential to our research.
I'm building Espresso (https://github.com/christopherkarani/Espresso), a pure-Swift inference framework for Apple Silicon that uses the private ANE APIs you've documented to achieve 4.76x faster inference than CoreML (519 tok/s on M3 Max).
We've validated much of what you've documented and discovered a few additional gotchas we'd love to contribute back:
softmaxon non-power-of-2 dimensions →InvalidMILProgramat compile timeslice_by_indexon function inputs combined with RMSNorm+convs →InvalidMILProgram(workaround: prepare data in the right layout before passing to the function)- Lane-packed attention kernels (spatial=32) necessary for stable ANE eval across M1-M4
reduce_meandoes NOT exist in raw MIL text format — usereduce_sum+mulby 1/N- ANE eval unstable on some hosts even for single-input identity kernels
Would you be open to:
- Adding a reference to Espresso in the README as a production inference usage example?
- A PR contributing these findings to your docs?
Happy to submit the PR regardless of the README link. This community benefits from shared knowledge, and your docs deserve to stay current.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels