Skip to content

Add a namespace for ATen mode #9894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2025
Merged

Add a namespace for ATen mode #9894

merged 1 commit into from
Apr 10, 2025

Conversation

larryliu0820
Copy link
Contributor

Summary:

Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a

User post

This is caused by user depending on both program_no_prim_ops and program_no_prim_ops_aten.

The issue happens because both libraries define symbols like: executorch::runtime::Method and they transitively depend on different definitions of Tensor and other types, see exec_aten.h.

The other common issue is re-registering operators:

buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.

User post
User post 2

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on //executorch/kernels/prim_ops:prim_ops_registry and //executorch/kernels/prim_ops:prim_ops_registry_aten then this will happen.

My proposal

Add a new namespace to the symbols in ATen mode.

executorch::runtime::Method --> executorch::runtime::aten::Method

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Differential Revision: D72440313

Copy link

pytorch-bot bot commented Apr 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9894

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 68baa3f with merge base a1af1ff (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 4, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 8, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 8, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

@facebook-github-bot facebook-github-bot merged commit ad6f5ee into main Apr 10, 2025
177 of 182 checks passed
@facebook-github-bot facebook-github-bot deleted the export-D72440313 branch April 10, 2025 03:16
kirklandsign pushed a commit that referenced this pull request Apr 11, 2025
Differential Revision: D72440313

Pull Request resolved: #9894
keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025
Differential Revision: D72440313

Pull Request resolved: pytorch#9894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants