Skip to content

Do not use BNNS copy when dtypes differ in CoreML #13018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2025

Conversation

metascroy
Copy link
Contributor

@metascroy metascroy commented Jul 30, 2025

BNNS copy crashes the process when the dtypes differ (#11714).

With the example in this PR (#11714), we crash the process on main. Here is the stack trace from LLDB:

Process 19234 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`__pthread_kill:
->  0x190ac9388 <+8>:  b.lo   0x190ac93a8    ; <+40>
    0x190ac938c <+12>: pacibsp 
    0x190ac9390 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x190ac9394 <+20>: mov    x29, sp
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124
    frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564
    frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680
    frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616
    frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188
    frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72
    frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148
    frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376
    frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52
    frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340
    frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152
    frame #16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296
    frame #17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180

With this PR, the process succeeds.

Copy link

pytorch-bot bot commented Jul 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13018

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 6 Pending

As of commit 9dda55a with merge base 5d3550f (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 30, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@metascroy
Copy link
Contributor Author

cc @cymbalrush can I can a sanity check here. Is BNNS expected to crash when the dtypes differ?

@cymbalrush
Copy link
Contributor

cymbalrush commented Jul 31, 2025

cc @cymbalrush can I can a sanity check here. Is BNNS expected to crash when the dtypes differ?

It's not expected to crash, do you have an example model for which it crashes? We can merge the PR but it would be good to know for which datatype and layout it's crashing.

@metascroy
Copy link
Contributor Author

cc @cymbalrush can I can a sanity check here. Is BNNS expected to crash when the dtypes differ?

It's not expected to crash, do you have an example model for which it crashes? We can merge the PR but it would be good to know for which datatype and layout it's crashing.

Yes, you can consistently see the crash on this toy model here with floor_divide: #11714

The output dtype mismatch occurs because CoreML converts the dtype of floor_divide to float32 internally, but the output dtype in the exported program has dtype int64.

@metascroy metascroy force-pushed the do-not-use-bnns-when-dtypes-differ branch from d7d74f4 to 9dda55a Compare July 31, 2025 22:54
@metascroy metascroy merged commit 43d90e5 into main Aug 1, 2025
229 of 232 checks passed
@metascroy metascroy deleted the do-not-use-bnns-when-dtypes-differ branch August 1, 2025 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants