[ET-VK] Consolidate shader compilation into one vkCreateComputePipelines call #11345

jorgep31415 · 2025-06-03T23:27:28Z

Stack from ghstack (oldest at bottom):

We target the QC Adreno driver implementation of Vulkan. The Vulkan API does not enforce how QC actually uses the cache. As the plural naming of vkCreateComputePipelines suggests, we observed that the createInfoCount, pCreateInfos and pPipelines arguments above allow construction of multiple compute pipelines in one invocation. We refactor ET-VK to accumulate metadata necessary for pipeline construction and invoke vkCreateComputePipelines only once. QC's implementation maximizes the cache if we create the same number of compute pipelines in fewer invocations of vkCreateComputePipelines. This decreases model load for a sample model from 1.7s to 300ms.

Differential Revision: D75763660

@SSJia

…nes call We target the QC Adreno driver implementation of Vulkan. The Vulkan API does not enforce how QC actually uses the cache. As the plural naming of `vkCreateComputePipelines` suggests, we observed that the `createInfoCount`, `pCreateInfos` and `pPipelines` arguments above allow construction of multiple compute pipelines in one invocation. We refactor ET-VK to accumulate metadata necessary for pipeline construction and invoke vkCreateComputePipelines only once. QC's implementation maximizes the cache if we create the same number of compute pipelines in fewer invocations of vkCreateComputePipelines. This decreases model load for a sample model from 1.7s to 1.0s, and down to 300ms once @SSJia removes the noop shader. Differential Revision: [D75763660](https://our.internmc.facebook.com/intern/diff/D75763660/) [ghstack-poisoned]

…nes call We target the QC Adreno driver implementation of Vulkan. The Vulkan API does not enforce how QC actually uses the cache. As the plural naming of `vkCreateComputePipelines` suggests, we observed that the `createInfoCount`, `pCreateInfos` and `pPipelines` arguments above allow construction of multiple compute pipelines in one invocation. We refactor ET-VK to accumulate metadata necessary for pipeline construction and invoke vkCreateComputePipelines only once. QC's implementation maximizes the cache if we create the same number of compute pipelines in fewer invocations of vkCreateComputePipelines. This decreases model load for a sample model from 1.7s to 1.0s, and down to 300ms once ssjia removes the noop shader. Differential Revision: [D75763660](https://our.internmc.facebook.com/intern/diff/D75763660/) ghstack-source-id: 287485414 Pull Request resolved: #11345

pytorch-bot · 2025-06-03T23:27:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11345

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 528b571 with merge base af0a246 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-06-03T23:27:38Z

This pull request was exported from Phabricator. Differential Revision: D75763660

…mputePipelines call" We target the QC Adreno driver implementation of Vulkan. The Vulkan API does not enforce how QC actually uses the cache. As the plural naming of `vkCreateComputePipelines` suggests, we observed that the `createInfoCount`, `pCreateInfos` and `pPipelines` arguments above allow construction of multiple compute pipelines in one invocation. We refactor ET-VK to accumulate metadata necessary for pipeline construction and invoke vkCreateComputePipelines only once. QC's implementation maximizes the cache if we create the same number of compute pipelines in fewer invocations of vkCreateComputePipelines. This decreases model load for a sample model from 1.7s to 1.0s, and down to 300ms once ssjia removes the noop shader. Differential Revision: [D75763660](https://our.internmc.facebook.com/intern/diff/D75763660/) [ghstack-poisoned]

facebook-github-bot · 2025-06-04T00:25:36Z

This pull request was exported from Phabricator. Differential Revision: D75763660

…mputePipelines call" We target the QC Adreno driver implementation of Vulkan. The Vulkan API does not enforce how QC actually uses the cache. As the plural naming of `vkCreateComputePipelines` suggests, we observed that the `createInfoCount`, `pCreateInfos` and `pPipelines` arguments above allow construction of multiple compute pipelines in one invocation. We refactor ET-VK to accumulate metadata necessary for pipeline construction and invoke vkCreateComputePipelines only once. QC's implementation maximizes the cache if we create the same number of compute pipelines in fewer invocations of vkCreateComputePipelines. This decreases model load for a sample model from 1.7s to 200ms. Differential Revision: [D75763660](https://our.internmc.facebook.com/intern/diff/D75763660/) [ghstack-poisoned]

facebook-github-bot · 2025-06-04T07:34:03Z

This pull request was exported from Phabricator. Differential Revision: D75763660

…mputePipelines call" We target the QC Adreno driver implementation of Vulkan. The Vulkan API does not enforce how QC actually uses the cache. As the plural naming of `vkCreateComputePipelines` suggests, we observed that the `createInfoCount`, `pCreateInfos` and `pPipelines` arguments above allow construction of multiple compute pipelines in one invocation. We refactor ET-VK to accumulate metadata necessary for pipeline construction and invoke vkCreateComputePipelines only once. QC's implementation maximizes the cache if we create the same number of compute pipelines in fewer invocations of vkCreateComputePipelines. This decreases model load for a sample model from 1.7s to 200ms. Differential Revision: [D75763660](https://our.internmc.facebook.com/intern/diff/D75763660/) [ghstack-poisoned]

facebook-github-bot · 2025-06-04T17:01:34Z

This pull request was exported from Phabricator. Differential Revision: D75763660

@jorgep31415

…nes call (#11381) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11345 by @jorgep31415 ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/jorgep31415/135/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/jorgep31415/135/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/jorgep31415/135/orig @diff-train-skip-merge Co-authored-by: jorgep31415 <jorgepineda140@gmail.com>

jorgep31415 requested a review from SS-JIA as a code owner June 3, 2025 23:27

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2025

facebook-github-bot added the fb-exported label Jun 3, 2025

jorgep31415 added the release notes: vulkan Changes to the Vulkan backend delegate label Jun 3, 2025

jorgep31415 mentioned this pull request Jun 4, 2025

[ET-VK] Support setting cache_data_path at run time #11350

Merged

SS-JIA approved these changes Jun 4, 2025

View reviewed changes

jorgep31415 mentioned this pull request Jun 4, 2025

[ET-VK] Support setting cache_data_path at build time #11359

Merged

facebook-github-bot merged commit 4e8b19f into gh/jorgep31415/135/base Jun 4, 2025
98 checks passed

facebook-github-bot deleted the gh/jorgep31415/135/head branch June 4, 2025 22:44

facebook-github-bot temporarily deployed to cherry-pick-bot June 4, 2025 22:44 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Jun 4, 2025

[ET-VK] Consolidate shader compilation into one vkCreateComputePipelines call #11381

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Consolidate shader compilation into one vkCreateComputePipelines call #11345

[ET-VK] Consolidate shader compilation into one vkCreateComputePipelines call #11345

Uh oh!

jorgep31415 commented Jun 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 3, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 3, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Consolidate shader compilation into one vkCreateComputePipelines call #11345

[ET-VK] Consolidate shader compilation into one vkCreateComputePipelines call #11345

Uh oh!

Conversation

jorgep31415 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11345

✅ No Failures

Uh oh!

facebook-github-bot commented Jun 3, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

jorgep31415 commented Jun 3, 2025 •

edited

Loading

pytorch-bot bot commented Jun 3, 2025 •

edited

Loading