-
Notifications
You must be signed in to change notification settings - Fork 537
Add a namespace for ATen mode #9894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9894
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 68baa3f with merge base a1af1ff ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D72440313 |
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Differential Revision: D72440313
5e8601a
to
aaac91b
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
aaac91b
to
d76c20e
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
d76c20e
to
b073c9a
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
b073c9a
to
a52a70b
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
a52a70b
to
b1d037b
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
b1d037b
to
002c2b4
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
002c2b4
to
3eb8d9f
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
3eb8d9f
to
6472ead
Compare
This pull request was exported from Phabricator. Differential Revision: D72440313 |
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
6472ead
to
48c6510
Compare
Summary: Pull Request resolved: #9894 ## Context As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators. A typical duplicate symbol issue looks like: ``` ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants() >>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a >>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/) This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`. The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`. The other common issue is re-registering operators: ``` buck2 test //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer -- --print-passing-details File changed: fbsource//xplat/executorch/build/fb/clients.bzl File changed: fbsource//xplat/executorch File changed: fbcode//executorch/build/fb/clients.bzl 16 additional file change events ⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134)) STDOUT: STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$ E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details. ``` [User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/) [User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/) This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen. ## My proposal Add a new namespace to the symbols in ATen mode. `executorch::runtime::Method` --> `executorch::runtime::aten::Method` This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled. This is not BC breaking for OSS users, since ATen mode was never exposed. Reviewed By: iseeyuan Differential Revision: D72440313
This pull request was exported from Phabricator. Differential Revision: D72440313 |
48c6510
to
68baa3f
Compare
Differential Revision: D72440313 Pull Request resolved: #9894
Differential Revision: D72440313 Pull Request resolved: pytorch#9894
Summary:
Context
As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.
A typical duplicate symbol issue looks like:
User post
This is caused by user depending on both
program_no_prim_ops
andprogram_no_prim_ops_aten
.The issue happens because both libraries define symbols like:
executorch::runtime::Method
and they transitively depend on different definitions ofTensor
and other types, seeexec_aten.h
.The other common issue is re-registering operators:
User post
User post 2
This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on
//executorch/kernels/prim_ops:prim_ops_registry
and//executorch/kernels/prim_ops:prim_ops_registry_aten
then this will happen.My proposal
Add a new namespace to the symbols in ATen mode.
executorch::runtime::Method
-->executorch::runtime::aten::Method
This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.
This is not BC breaking for OSS users, since ATen mode was never exposed.
Differential Revision: D72440313