Share state in module by lucylq · Pull Request #17719 · pytorch/executorch

lucylq · 2026-02-25T23:15:34Z

Summary

Allow Module to share activation memory when share_memory_arenas=True.

In a PTE file, each method may have:
mem_id=1: activation memory
mem_id=2: shared memory (usually for shared state)

These are specified in program.execution_plan[method_index].non_const_buffer_sizes. Tensors with mem_id=2 point to the same memory offset across different methods.

In Module.cpp, freshly allocated memory is provided to each method for activation space. This PR creates a shared_buffer and passes it into to each method instead of allocating fresh memory, when share_memory_arenas=True.

NOTE

Regular activation memory is also shared (mem_id=1). The largest activation memory for each method is used.
I believe this is safe with the existing concurrent execution. In TestConcurrentExecutionWithSharedProgram, it's the (immutable) program that is shared. N separate Modules hold a shared_ptr to the same program. So each module has its own memory buffers etc., and sharing is local to the Module and not across threads. I don't think concurrency within a single Module/between methods is supported.
Running multiple methods concurrently in the same Module is unsafe, and this PR makes it even more unsafe.

RISKS

Data invalidation across methods with shared memory

  auto m1 = module.execute("m1", {x1});
  auto m2 = module.execute("m2", {x2});

m1 and m2 will point to the same output memory buffer when sharing memory (activation memory is shared); m1 must be copied to a separate buffer, otherwise it will store the result of m2.

Test plan

cmake .. \
-DEXECUTORCH_BUILD_TESTS=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
-DEXECUTORCH_BUILD_KERNELS_PORTABLE=ON

cmake --build . --target extension_module_test

ctest -R extension_module_test --output-on-failure

pytorch-bot · 2026-02-25T23:15:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17719

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit 31004a4 with merge base 4dadf24 ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t d419a13ac5eebf7bb2f26d795906eb1d4dbdcfef749828d9e4c7ab915be33761 /exec failed with exit code 139
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t ef74f1236fcba39e0d6be8045b218af4534e829cd06dde67c031eb70e5c870d1 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t ef094579c7290fafbe7334eddad1060ab0e7e176075c11fc5dfd2a49d54b9c5f /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t db13f762943a41724c9074d62f5dbc1f75af9079609d1665561b6f9cf3967b0c /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.13) / linux-job (gh)
RuntimeError: Command docker exec -t 997735eb0699e368b793f0069acae7c887387c63db773aa7a1b756ec57781089 /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
test_edsr_fp16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-25T23:16:17Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This pull request adds support for sharing memory arenas across methods in the Module class when share_memory_arenas=True. This feature is required for models exported with share_mutable_buffers=True, where methods need to access shared mutable state (e.g., KV cache in LLMs).

Changes:

Added share_memory_arenas parameter to all Module constructors (defaults to false for backward compatibility)
Refactored MethodHolder to extract memory planning fields into a new PlannedMemory struct that can be shared across methods
Added helper methods to compute maximum memory-planned buffer sizes across all methods when sharing is enabled

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
extension/module/module.h	Added `share_memory_arenas` parameter to all constructors; extracted PlannedMemory struct; added helper method declarations
extension/module/module.cpp	Implemented memory sharing logic; refactored load_method to use shared or per-method PlannedMemory
extension/module/test/module_test.cpp	Added TestSharedMemoryBuffer test to validate shared state functionality
extension/module/test/CMakeLists.txt	Added ModuleSharedState.pte generation and environment variable configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kimishpatel · 2026-02-26T01:09:54Z

extension/module/module.cpp

+        if (!shared_planned_memory_) {
+          auto max_res = get_max_mem_planned_buffer_sizes();
+          ET_CHECK_OK_OR_RETURN_ERROR(max_res.error());
+          shared_planned_memory_ = make_planned_memory(max_res.get());
+        }
+        method_holder.planned_memory = shared_planned_memory_;


This does mean that methods cannot be invoked in parallel as they may overwrite each others' arena. Document that explicitly

Also if AOT memory planning accounted for it then should we not assert that the planned memory has the same mem_id or something for them to be shareable?

@kimishpatel

This does mean that methods cannot be invoked in parallel as they may overwrite each others' arena. Document that explicitly

Yeah that's right. This isn't a currently supported feature (unsafe to run multiple methods in separate threads within a Module, even without sharing), but sharing makes it harder to support. I'll add an extra comment in module.h

Also if AOT memory planning accounted for it then should we not assert that the planned memory has the same mem_id or something for them to be shareable?

Initially I was thinking we only share physical buffers when mem_id=2 (buffer marked as shareable AoT).

However, @JacobSzwejbka's original diff D82329513, shares both and I don't see any issues with that besides output tensors needing to be copied into permanent memory after running each method, which I think is OK (users have to do this when running the same method twice already).

i was just wondering if we should enforce that only buffers that are on the same mem_id can share memory

Yeah I was wondering that too. It seems fine to share all of it, and we get some memory savings as well, wdyt?

i think mem_id has specific meaning so I think asserting for that is good.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 25, 2026

lucylq requested a review from JacobSzwejbka February 25, 2026 23:22

lucylq force-pushed the lfq.module-shared-buffer-test branch 2 times, most recently from a26fac8 to 7794f42 Compare February 25, 2026 23:42

lucylq marked this pull request as ready for review February 25, 2026 23:43

lucylq requested a review from larryliu0820 as a code owner February 25, 2026 23:43

Copilot AI review requested due to automatic review settings February 25, 2026 23:43

lucylq requested review from kirklandsign and shoumikhin as code owners February 25, 2026 23:43

lucylq mentioned this pull request Feb 25, 2026

Share kv-cache in multimethod pte #17496

Open

Copilot started reviewing on behalf of lucylq February 25, 2026 23:44 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

lucylq requested a review from kimishpatel February 26, 2026 00:57

kimishpatel reviewed Feb 26, 2026

View reviewed changes

Share state in module

31004a4

lucylq force-pushed the lfq.module-shared-buffer-test branch from 7794f42 to 31004a4 Compare February 26, 2026 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share state in module#17719

Share state in module#17719
lucylq wants to merge 1 commit intomainfrom
lfq.module-shared-buffer-test

lucylq commented Feb 25, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

kimishpatel Feb 26, 2026

Uh oh!

kimishpatel Feb 26, 2026

Uh oh!

lucylq Feb 26, 2026 •

edited

Loading

Uh oh!

kimishpatel Feb 26, 2026

Uh oh!

lucylq Feb 26, 2026

Uh oh!

kimishpatel Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lucylq commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17719

❌ 6 New Failures

Uh oh!

github-actions bot commented Feb 25, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

kimishpatel Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

kimishpatel Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

kimishpatel Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucylq commented Feb 25, 2026 •

edited

Loading

pytorch-bot bot commented Feb 25, 2026 •

edited

Loading

This PR needs a `release notes:` label

lucylq Feb 26, 2026 •

edited

Loading