Skip to content

Arm backend: Allocate the scratch buffer runtime rather than in the pte #10714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025

Conversation

gggekov
Copy link
Collaborator

@gggekov gggekov commented May 6, 2025

This change lowers the size of the pte and allows you to allocate the scratch buffer in an array, usually in the SRAM, for more efficient memory usage on a MCU

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

@gggekov gggekov requested a review from digantdesai as a code owner May 6, 2025 12:44
Copy link

pytorch-bot bot commented May 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10714

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Cancelled Job

As of commit 2e023a2 with merge base 2ec8678 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2025
@gggekov gggekov added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk topic: not user facing labels May 6, 2025
@zingo zingo changed the title Arm backend: Allocate the scratch buffer in an array rather than in t… Arm backend: Allocate the scratch buffer runtime rather than in the pte May 6, 2025
@@ -92,7 +92,7 @@ def vela_compile(tosa_flatbuffer: bytes, args: List[str], verbose: bool = False)
if not isinstance(data["scratch_shape"][0], np.int64):
raise RuntimeError("Expected scratch to be int64")
block_length = int(data["scratch_shape"][0])
bin_blocks["scratch_data"] = b"\x00" * block_length
bin_blocks["scratch_size"] = struct.pack("<I", block_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the CI is green, thanks.

@gggekov
Copy link
Collaborator Author

gggekov commented May 14, 2025

Yes, I am adding support for Dedicated_Sram for U85 & changing the default mem mode we test on U85. This is the proper fix for the fail we see for inception_v4. With this fix, we will place the NN & scratch buffer in the DDR and use the SRAM as a cache. The reason for the failure is that the scratch_buffer for inception_v4 is around 2.6-2.7MB, we allocate the scratch buffer in the SRAM, but on the CS-300 we only 2MB of SRAM. Will update the pr soon.

…Sram for Ethos-U85

This change lowers the size of the pte and allows you to allocate the
scratch buffer in an array, usually in the SRAM, for more efficient
memory usage on a MCU.

Also, add support Dedicated_Sram memory mode in the runtime
and make it the default memory mode for Ethos-U85.

Change-Id: I04cf9de49a6116141d402b9ad5ca4f21e2025236
@gggekov gggekov force-pushed the Allocate_scratch_buffer_outside_pte branch from 2a35a48 to 2e023a2 Compare May 16, 2025 13:18
@zingo zingo added the release notes: arm Changes to the ARM backend delegate label May 16, 2025
@zingo
Copy link
Collaborator

zingo commented May 16, 2025

failed test are unrelated

@zingo zingo merged commit f39a1bb into pytorch:main May 16, 2025
185 of 189 checks passed
@kirklandsign
Copy link
Contributor

Hi @gggekov @zingo this is breaking our internal test.

undefined reference to `executorch::backends::arm::ethosu_fast_scratch_size'

Seems it's defined in arm_executor_runner only. Any suggestions what we should do on our side? Otherwise OK to revert this?

@kirklandsign
Copy link
Contributor

Could you please review #10958 for a temp fix

@gggekov
Copy link
Collaborator Author

gggekov commented May 19, 2025

Hi @kirklandsign,
Thanks for the message, added a comment in #10958

Comment on lines +192 to +193
extern size_t ethosu_fast_scratch_size;
extern unsigned char* ethosu_fast_scratch;
Copy link
Contributor

@digantdesai digantdesai May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is not the cleanest. Let me think of a better way to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants