[ET-VK] Store weights transposed for int8 linear #9765

SS-JIA · 2025-03-31T16:09:14Z

Stack from ghstack (oldest at bottom):

Context

The weight tensor of a linear layer is usually stored in a transposed manner, such that when computing the matrix multiplication, the reduction traverses along the rows of the weight tensor as opposed to the columns. This results in a better memory access pattern for CPUs.

However, for GPUs, I have found that "un-transposing" the weight tensors result in better performance. This is likely due to the fact since GPUs can compute multiple output elements in parallel, reading along the columns allows for coalescing memory loads among threads in a work group.

Changes

Introduce the ability to transpose height and weight dims when transferring tensor data to the GPU.
Prepackthe weight tensor "un-transposed" for the int8 quantized linear operator

Differential Revision: D72066588

## Context The weight tensor of a linear layer is usually stored in a transposed manner, such that when computing the matrix multiplication, the reduction traverses along the rows of the weight tensor as opposed to the columns. This results in a better memory access pattern for CPUs. However, for GPUs, I have found that "un-transposing" the weight tensors result in better performance. This is likely due to the fact since GPUs can compute multiple output elements in parallel, reading along the columns allows for coalescing memory loads among threads in a work group. ## Changes * Introduce the ability to transpose height and weight dims when transferring tensor data to the GPU. * Prepackthe weight tensor "un-transposed" for the int8 quantized linear operator Differential Revision: [D72066588](https://our.internmc.facebook.com/intern/diff/D72066588/) [ghstack-poisoned]

pytorch-bot · 2025-03-31T16:09:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9765

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit 1f29600 with merge base 2aa7748 ():

NEW FAILURE - The following job has failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels

CANCELLED JOB - The following job was cancelled. Please retry:

pull / test-static-llama-qnn-linux / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-models-linux (mv2, portable, linux.2xlarge) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-03-31T16:09:25Z

This pull request was exported from Phabricator. Differential Revision: D72066588

github-actions · 2025-03-31T16:10:16Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

## Context The weight tensor of a linear layer is usually stored in a transposed manner, such that when computing the matrix multiplication, the reduction traverses along the rows of the weight tensor as opposed to the columns. This results in a better memory access pattern for CPUs. However, for GPUs, I have found that "un-transposing" the weight tensors result in better performance. This is likely due to the fact since GPUs can compute multiple output elements in parallel, reading along the columns allows for coalescing memory loads among threads in a work group. ## Changes * Introduce the ability to transpose height and weight dims when transferring tensor data to the GPU. * Prepackthe weight tensor "un-transposed" for the int8 quantized linear operator Differential Revision: [D72066588](https://our.internmc.facebook.com/intern/diff/D72066588/) [ghstack-poisoned]

facebook-github-bot · 2025-03-31T19:18:42Z

This pull request was exported from Phabricator. Differential Revision: D72066588

SS-JIA mentioned this pull request Mar 31, 2025

[ET-VK] Efficient tiled int8 matmul #9766

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 31, 2025

facebook-github-bot added the fb-exported label Mar 31, 2025

trivedivivek approved these changes Apr 1, 2025

View reviewed changes

facebook-github-bot merged commit 439d66d into gh/SS-JIA/204/base Apr 1, 2025
79 of 84 checks passed

facebook-github-bot deleted the gh/SS-JIA/204/head branch April 1, 2025 16:14

facebook-github-bot temporarily deployed to cherry-pick-bot April 1, 2025 16:14 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Apr 1, 2025

[ET-VK] Store weights transposed for int8 linear #9803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Store weights transposed for int8 linear #9765

[ET-VK] Store weights transposed for int8 linear #9765

Uh oh!

SS-JIA commented Mar 31, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 31, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 31, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

facebook-github-bot commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Store weights transposed for int8 linear #9765

[ET-VK] Store weights transposed for int8 linear #9765

Uh oh!

Conversation

SS-JIA commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9765

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Mar 31, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

SS-JIA commented Mar 31, 2025 •

edited

Loading

pytorch-bot bot commented Mar 31, 2025 •

edited

Loading

This PR needs a `release notes:` label