- Add codegen for embedding backward meta functions #2347

q10 · 2024-02-22T18:41:13Z

Summary:
Adding embedding backward meta codegen functions.

Moved memory alignment that was outside of the cuda kernel into the custom operator, since we couldn't write a symbolic version for memory alignment checks on the pointers.

Tests are changed to allow compilation only on adagrad. Other tests are ran to ensure they continue to work properly.

There are missing fixes to allow compilation for unweighted kernels and CPU, which are excluded from the tests.

Differential Revision: D53674518

facebook-github-bot · 2024-02-22T18:41:22Z

This pull request was exported from Phabricator. Differential Revision: D53674518

netlify · 2024-02-22T18:41:32Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`4aa3883`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65deb93f4c07b90008dd0cea
😎 Deploy Preview	https://deploy-preview-2347--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

ezyang · 2024-02-23T02:27:27Z

fbgemm_gpu/test/tbe/training/backward_adagrad_test.py

@@ -270,8 +272,13 @@ def execute_backward_adagrad_(  # noqa C901
            output_dtype=output_dtype,
        )

+        # TODO: make it compile for CPU and unweighted
+        if compile and not use_cpu and weighted:
+            cc = torch.compile(cc)


consider adding fullgraph=True here

ezyang · 2024-02-23T02:31:27Z

fbgemm_gpu/include/fbgemm_gpu/sparse_ops_utils.h

+      #x " must have the same number of elements as " #y " They had ", \
+      (x).sym_numel(),                                                 \
+      " and ",                                                         \
+      (y).sym_numel())


A more robust version of this which works better with unbacked symints is

TORCH_SYM_CHECK(x.sym_numel().sym_eq(y.sym_numel()))

for more explanation on what this is doing, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit#heading=h.jqnrfurlygn5

fbgemm_gpu/codegen/embedding_backward_split_host_template.cpp

Summary: Adding embedding backward meta codegen functions. Moved memory alignment that was outside of the cuda kernel into the custom operator, since we couldn't write a symbolic version for memory alignment checks on the pointers. Tests are changed to allow compilation only on adagrad. Other tests are ran to ensure they continue to work properly. There are missing fixes to allow compilation for unweighted kernels and CPU, which are excluded from the tests. Reviewed By: q10, Microve Differential Revision: D53674518

facebook-github-bot · 2024-02-23T15:51:53Z

This pull request was exported from Phabricator. Differential Revision: D53674518

facebook-github-bot · 2024-02-23T15:57:16Z

This pull request was exported from Phabricator. Differential Revision: D53674518

Summary: Pull Request resolved: pytorch#2347 Adding embedding backward meta codegen functions. Moved memory alignment that was outside of the cuda kernel into the custom operator, since we couldn't write a symbolic version for memory alignment checks on the pointers. Tests are changed to allow compilation only on adagrad. Other tests are ran to ensure they continue to work properly. There are missing fixes to allow compilation for unweighted kernels and CPU, which are excluded from the tests. Reviewed By: q10, Microve Differential Revision: D53674518

ezyang · 2024-02-23T19:07:18Z

fbgemm_gpu/codegen/embedding_backward_split_indice_weights_template.cu

+    }
+    if (reinterpret_cast<uint64_t>(grad_output.data_ptr()) % 16 != 0) {
+        aligned_grad_output = at::empty_like(grad_output).copy_(grad_output);
+    }


I guess the code got moved here. I guess you're only running this inside of the CUDA kernel now?

One potential hazard to be aware of when doing a transform like this, is if the operator you moved this logic into is differentiable. The backward in that case may have been relying on the input being guaranteed to be aligned, including the saved copy for backwards.

ezyang · 2024-02-23T19:09:02Z

fbgemm_gpu/codegen/embedding_backward_split_indice_weights_template.cu

@@ -253,7 +265,7 @@ Tensor {{ ddesc }}_embedding_codegen_grad_indice_weights{{ vdesc }}_cuda(
    const auto total_B = offsets.size(0) - 1;
    TORCH_CHECK_GE(total_B, 0);
    TORCH_CHECK_LE(max_D, {{ max_embedding_dim }});
-    auto grad_indice_weights = empty_like(indices, indices.options().dtype(at::toAccumulateType(grad_output.scalar_type(), true)));
+    auto grad_indice_weights = empty_like(indices, indices.options().dtype(at::toAccumulateType(aligned_grad_output.scalar_type(), true)));


I'm not going to carefully audit that you updated all the downstream use sites. To make it obvious you didn't do it wrong, change the input name and then once you align grad output, assign it to grad_output, no diff afterwards.

grad_output is const here, so I need to create new variable.

ezyang · 2024-02-23T19:11:15Z

fbgemm_gpu/codegen/embedding_backward_split_indice_weights_template.cu

+) {
+
+    const auto T = D_offsets.sym_size(0) - 1;
+    TORCH_CHECK_GT(T, 0);


We can also potentially make these more unbacked symint friendly, but happy to leave this for later too.

ezyang · 2024-02-23T19:11:26Z

fbgemm_gpu/codegen/embedding_backward_split_indice_weights_template.cu

+
+    auto grad_indice_weights = empty_like(indices, indices.options().dtype(at::toAccumulateType(grad_output.scalar_type(), true)));
+
+    return grad_indice_weights;


I'm trusting you that this accurately reflects the original logic ;)

ezyang · 2024-02-23T19:12:55Z

fbgemm_gpu/codegen/embedding_backward_split_template.cu

+    }
+    if (reinterpret_cast<uint64_t>(grad_output.data_ptr()) % 16 != 0) {
+        aligned_grad_output = at::empty_like(grad_output).copy_(grad_output);
+    }


duped! Maybe factor this out to a helper?

ezyang · 2024-02-23T19:14:32Z

fbgemm_gpu/codegen/embedding_backward_split_meta_template.cpp

+    {%- endif %}
+
+    // short-circuit if there are zero indices.
+    if (indices.sym_numel() == 0) {


Make this one size oblivious, per https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit#heading=h.11jnmcqhq5yy

ezyang

Thanks, nice work!

Summary: Adding embedding backward meta codegen functions. Moved memory alignment that was outside of the cuda kernel into the custom operator, since we couldn't write a symbolic version for memory alignment checks on the pointers. Tests are changed to allow compilation only on adagrad. Other tests are ran to ensure they continue to work properly. There are missing fixes to allow compilation for unweighted kernels and CPU, which are excluded from the tests. Reviewed By: ezyang, q10, Microve Differential Revision: D53674518

facebook-github-bot · 2024-02-29T17:29:57Z

This pull request has been merged in 5f48fbd.

facebook-github-bot added the cla signed label Feb 22, 2024

facebook-github-bot added the fb-exported label Feb 22, 2024

flaviotruzzi requested a review from ezyang February 22, 2024 23:45

ezyang reviewed Feb 23, 2024

View reviewed changes

fbgemm_gpu/codegen/embedding_backward_split_host_template.cpp Show resolved Hide resolved

q10 force-pushed the export-D53674518 branch from fafe6a3 to 511192c Compare February 23, 2024 15:51

q10 force-pushed the export-D53674518 branch from 511192c to 54c57fa Compare February 23, 2024 15:57

ezyang reviewed Feb 23, 2024

View reviewed changes

ezyang approved these changes Feb 23, 2024

View reviewed changes

q10 force-pushed the export-D53674518 branch from 54c57fa to e4399d0 Compare February 27, 2024 22:41

q10 force-pushed the export-D53674518 branch from e4399d0 to afc4e95 Compare February 27, 2024 22:42

q10 force-pushed the export-D53674518 branch from afc4e95 to a1f0136 Compare February 27, 2024 22:42

q10 force-pushed the export-D53674518 branch from a1f0136 to 5d4714a Compare February 28, 2024 01:36

q10 force-pushed the export-D53674518 branch from 5d4714a to 4aa3883 Compare February 28, 2024 04:40

facebook-github-bot closed this in 5f48fbd Feb 29, 2024

facebook-github-bot added the Merged label Feb 29, 2024


		auto grad_indice_weights = empty_like(indices, indices.options().dtype(at::toAccumulateType(grad_output.scalar_type(), true)));

		return grad_indice_weights;

- Add codegen for embedding backward meta functions #2347

- Add codegen for embedding backward meta functions #2347

Uh oh!

Conversation

q10 commented Feb 22, 2024

Uh oh!

facebook-github-bot commented Feb 22, 2024

Uh oh!

netlify bot commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Feb 23, 2024

Uh oh!

facebook-github-bot commented Feb 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 29, 2024

Uh oh!

Uh oh!

netlify bot commented Feb 22, 2024 •

edited

Loading