Skip to content

Conversation

@larryliu0820
Copy link
Contributor

Summary:

Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a

User post

This is caused by user depending on both program_no_prim_ops and program_no_prim_ops_aten.

The issue happens because both libraries define symbols like: executorch::runtime::Method and they transitively depend on different definitions of Tensor and other types, see exec_aten.h.

The other common issue is re-registering operators:

buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.

User post
User post 2

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on //executorch/kernels/prim_ops:prim_ops_registry and //executorch/kernels/prim_ops:prim_ops_registry_aten then this will happen.

My proposal

Add a new namespace to the symbols in ATen mode.

executorch::runtime::Method --> executorch::runtime::aten::Method

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Differential Revision: D72440313

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9894

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 68baa3f with merge base a1af1ff (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 4, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 8, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 8, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

larryliu0820 added a commit that referenced this pull request Apr 9, 2025
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
Summary:
Pull Request resolved: #9894

## Context

As titled. This is an effort trying to solve a big pain point for ATen mode users: duplicate symbols and duplicate operators.

A typical duplicate symbol issue looks like:

```
ld.lld: error: duplicate symbol: executorch::runtime::Method::get_num_external_constants()
>>> defined at __stripped__/method.cpp.pic.stripped.o:(executorch::runtime::Method::get_num_external_constants()) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops_aten__/libprogram_no_prim_ops_aten.stripped.pic.a
>>> defined at __stripped__/method.cpp.pic.stripped.o:(.text._ZN10executorch7runtime6Method26get_num_external_constantsEv+0x0) in archive buck-out/v2/gen/fbcode/712c6d0a4cb497c7/executorch/runtime/executor/__program_no_prim_ops__/libprogram_no_prim_ops.stripped.pic.a
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1727735561430063/)

This is caused by user depending on both `program_no_prim_ops` and `program_no_prim_ops_aten`.

The issue happens because both libraries define symbols like: `executorch::runtime::Method` and they transitively depend on different definitions of `Tensor` and other types, see `exec_aten.h`.

The other common issue is re-registering operators:

```
buck2 test  //arvr/libraries/wristband/tsn/transformers:TorchstreamTransformer  -- --print-passing-details
File changed: fbsource//xplat/executorch/build/fb/clients.bzl
File changed: fbsource//xplat/executorch
File changed: fbcode//executorch/build/fb/clients.bzl
16 additional file change events
⚠ Listing failed: fbsource//arvr/libraries/wristband/tsn/transformers:TorchstreamTransformerTestFbcode
Failed to list tests. Expected exit code 0 but received: ExitStatus(unix_wait_status(134))
STDOUT:
STDERR:E 00:00:00.000543 executorch:operator_registry.cpp:86] Re-registering aten::sym_size.int, from /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbsource/cfdc20bd56300721/arvr/libraries/wristband/tsn/transformers/__TorchstreamTransformerTestFbcode__/./__TorchstreamTransformerTestFbcode__shared_libs_symlink_tre$
E 00:00:00.000572 executorch:operator_registry.cpp:87] key: (null), is_fallback: true
F 00:00:00.000576 executorch:operator_registry.cpp:111] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
```
[User post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1691696305033989/)
[User post 2](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1510414646495490/)

This is worse than duplicate symbols because it's a runtime error. This happens because a user depends on a kernel library built with ATen tensors and a kernel library built with ET tensor at the same time. For example, if a C++ binary depends on `//executorch/kernels/prim_ops:prim_ops_registry` and `//executorch/kernels/prim_ops:prim_ops_registry_aten` then this will happen.

## My proposal

Add a new namespace to the symbols in ATen mode.

`executorch::runtime::Method` --> `executorch::runtime::aten::Method`

This way a C++ binary is allowed to depend on an ET library with ATen mode enabled and an ET library with ATen mode disabled.

This is not BC breaking for OSS users, since ATen mode was never exposed.

Reviewed By: iseeyuan

Differential Revision: D72440313
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72440313

@facebook-github-bot facebook-github-bot merged commit ad6f5ee into main Apr 10, 2025
177 of 182 checks passed
@facebook-github-bot facebook-github-bot deleted the export-D72440313 branch April 10, 2025 03:16
kirklandsign pushed a commit that referenced this pull request Apr 11, 2025
Differential Revision: D72440313

Pull Request resolved: #9894
keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025
Differential Revision: D72440313

Pull Request resolved: pytorch#9894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants