int8 output for seq embeddings #2316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

YazhiGao wants to merge 1 commit into pytorch:main from YazhiGao:export-D53449813

YazhiGao commented Feb 6, 2024

Summary:

int8 output dtype is a gap for recently fbgemm usage case, setup a reasonable refimplementation first, memcpy based.
for sequence embedding, we first unblock dispatch via simple memcpy, it is a pure bw op(no dequant) so memcpy should be reasonably ok. further optimization like ILP via unrolling, try avx non-temp instruction, rep instruction to be done in future iterations.

Differential Revision: D53449813

netlify bot commented Feb 6, 2024 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`816faa1`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65c2ac284d9c7c000829a7c7
😎 Deploy Preview	https://deploy-preview-2316--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Feb 6, 2024

This pull request was exported from Phabricator. Differential Revision: D53449813

facebook-github-bot added the fb-exported label

YazhiGao pushed a commit to YazhiGao/FBGEMM that referenced this pull request


          int8 output for seq embeddings (pytorch#2316)

5b5027b

Summary:

* int8 output dtype is a gap for recently fbgemm usage case, setup a reasonable refimplementation first, memcpy based.
* for sequence embedding, we first unblock dispatch via simple memcpy, it is a pure bw op(no dequant) so memcpy should be reasonably ok. further optimization like ILP via unrolling, try avx non-temp instruction, rep instruction to be done in future iterations.

Differential Revision: D53449813

YazhiGao force-pushed the export-D53449813 branch from 2d86f0d to 5b5027b Compare

February 6, 2024 21:49

Contributor

facebook-github-bot commented Feb 6, 2024

This pull request was exported from Phabricator. Differential Revision: D53449813

YazhiGao pushed a commit to YazhiGao/FBGEMM that referenced this pull request


          int8 output for seq embeddings (pytorch#2316)

89445b1

Summary:

* int8 output dtype is a gap for recently fbgemm usage case, setup a reasonable refimplementation first, memcpy based.
* for sequence embedding, we first unblock dispatch via simple memcpy, it is a pure bw op(no dequant) so memcpy should be reasonably ok. further optimization like ILP via unrolling, try avx non-temp instruction, rep instruction to be done in future iterations.

Differential Revision: D53449813

YazhiGao force-pushed the export-D53449813 branch from 5b5027b to 89445b1 Compare

February 6, 2024 21:51

Contributor

facebook-github-bot commented Feb 6, 2024

This pull request was exported from Phabricator. Differential Revision: D53449813

YazhiGao pushed a commit to YazhiGao/FBGEMM that referenced this pull request


          int8 output for seq embeddings (pytorch#2316)

ea763e9

Summary:

* int8 output dtype is a gap for recently fbgemm usage case, setup a reasonable refimplementation first, memcpy based.
* for sequence embedding, we first unblock dispatch via simple memcpy, it is a pure bw op(no dequant) so memcpy should be reasonably ok. further optimization like ILP via unrolling, try avx non-temp instruction, rep instruction to be done in future iterations.

Differential Revision: D53449813

YazhiGao force-pushed the export-D53449813 branch from 89445b1 to ea763e9 Compare

February 6, 2024 21:52

Contributor

facebook-github-bot commented Feb 6, 2024

This pull request was exported from Phabricator. Differential Revision: D53449813


          int8 output for seq embeddings (pytorch#2316)

816faa1

Summary:

* int8 output dtype is a gap for recently fbgemm usage case, setup a reasonable refimplementation first, memcpy based.
* for sequence embedding, we first unblock dispatch via simple memcpy, it is a pure bw op(no dequant) so memcpy should be reasonably ok. further optimization like ILP via unrolling, try avx non-temp instruction, rep instruction to be done in future iterations.

Differential Revision: D53449813

YazhiGao force-pushed the export-D53449813 branch from ea763e9 to 816faa1 Compare

February 6, 2024 22:01

Contributor

facebook-github-bot commented Feb 6, 2024

This pull request was exported from Phabricator. Differential Revision: D53449813

facebook-github-bot closed this in

af41af1

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Feb 7, 2024

This pull request has been merged in af41af1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged