Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17719
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit 31004a4 with merge base 4dadf24 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
a26fac8 to
7794f42
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds support for sharing memory arenas across methods in the Module class when share_memory_arenas=True. This feature is required for models exported with share_mutable_buffers=True, where methods need to access shared mutable state (e.g., KV cache in LLMs).
Changes:
- Added
share_memory_arenasparameter to all Module constructors (defaults to false for backward compatibility) - Refactored MethodHolder to extract memory planning fields into a new PlannedMemory struct that can be shared across methods
- Added helper methods to compute maximum memory-planned buffer sizes across all methods when sharing is enabled
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| extension/module/module.h | Added share_memory_arenas parameter to all constructors; extracted PlannedMemory struct; added helper method declarations |
| extension/module/module.cpp | Implemented memory sharing logic; refactored load_method to use shared or per-method PlannedMemory |
| extension/module/test/module_test.cpp | Added TestSharedMemoryBuffer test to validate shared state functionality |
| extension/module/test/CMakeLists.txt | Added ModuleSharedState.pte generation and environment variable configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!shared_planned_memory_) { | ||
| auto max_res = get_max_mem_planned_buffer_sizes(); | ||
| ET_CHECK_OK_OR_RETURN_ERROR(max_res.error()); | ||
| shared_planned_memory_ = make_planned_memory(max_res.get()); | ||
| } | ||
| method_holder.planned_memory = shared_planned_memory_; |
There was a problem hiding this comment.
This does mean that methods cannot be invoked in parallel as they may overwrite each others' arena. Document that explicitly
There was a problem hiding this comment.
Also if AOT memory planning accounted for it then should we not assert that the planned memory has the same mem_id or something for them to be shareable?
There was a problem hiding this comment.
This does mean that methods cannot be invoked in parallel as they may overwrite each others' arena. Document that explicitly
Yeah that's right. This isn't a currently supported feature (unsafe to run multiple methods in separate threads within a Module, even without sharing), but sharing makes it harder to support. I'll add an extra comment in module.h
Also if AOT memory planning accounted for it then should we not assert that the planned memory has the same mem_id or something for them to be shareable?
Initially I was thinking we only share physical buffers when mem_id=2 (buffer marked as shareable AoT).
However, @JacobSzwejbka's original diff D82329513, shares both and I don't see any issues with that besides output tensors needing to be copied into permanent memory after running each method, which I think is OK (users have to do this when running the same method twice already).
There was a problem hiding this comment.
i was just wondering if we should enforce that only buffers that are on the same mem_id can share memory
There was a problem hiding this comment.
Yeah I was wondering that too. It seems fine to share all of it, and we get some memory savings as well, wdyt?
There was a problem hiding this comment.
i think mem_id has specific meaning so I think asserting for that is good.
7794f42 to
31004a4
Compare
Summary
Taken from D82329513
Allow Module to share activation memory when
share_memory_arenas=True.In a PTE file, each method may have:
mem_id=1: activation memory
mem_id=2: shared memory (usually for shared state)
These are specified in
program.execution_plan[method_index].non_const_buffer_sizes. Tensors withmem_id=2point to the same memory offset across different methods.In Module.cpp, freshly allocated memory is provided to each method for activation space. This PR creates a shared_buffer and passes it into to each method instead of allocating fresh memory, when
share_memory_arenas=True.NOTE
Regular activation memory is also shared (mem_id=1). The largest activation memory for each method is used.
I believe this is safe with the existing concurrent execution. In
TestConcurrentExecutionWithSharedProgram, it's the (immutable) program that is shared. N separate Modules hold a shared_ptr to the same program. So each module has its own memory buffers etc., and sharing is local to the Module and not across threads. I don't think concurrency within a single Module/between methods is supported.Running multiple methods concurrently in the same Module is unsafe, and this PR makes it even more unsafe.
RISKS
m1 and m2 will point to the same output memory buffer when sharing memory (activation memory is shared); m1 must be copied to a separate buffer, otherwise it will store the result of m2.
Test plan