You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #9894
## Context
As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.
A typical duplicate symbol issue looks like:
```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)
This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.
The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.
The other common issue is re-registering operators:
```
buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)
This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.
## My proposal
Add a new namespace to the symbols in ATen mode.
`executorch::runtime::Method` --> `executorch::runtime::aten::Method`
This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.
This is not BC breaking for OSS users, since ATen mode was never exposed.
Reviewed By: iseeyuan
Differential Revision: D72440313
0 commit comments