Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Nicoshev · 2025-02-18T19:26:31Z

Summary:
QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function

We have observed a ~x12 performance improvement for the downcasting case.
The case where a float32_t is returned maintains the same speed:

Full results:

before:

P1732996851

after:

P1732996401

Differential Revision: D69573878

facebook-github-bot · 2025-02-18T19:26:44Z

This pull request was exported from Phabricator. Differential Revision: D69573878

netlify · 2025-02-18T19:26:51Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`295e157`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67be3c84778e5b00082f991e
😎 Deploy Preview	https://deploy-preview-3707--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T19:33:08Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T21:39:21Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T21:41:52Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T23:42:52Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T23:45:16Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T23:48:42Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-19T20:09:21Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-24T18:42:15Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-24T19:14:47Z

This pull request was exported from Phabricator. Differential Revision: D69573878

facebook-github-bot · 2025-02-24T22:01:35Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 Pull Request resolved: pytorch#3707 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-25T21:56:29Z

This pull request was exported from Phabricator. Differential Revision: D69573878

facebook-github-bot · 2025-02-27T02:45:41Z

This pull request has been merged in 3de6774.

…alf (pytorch#789) Summary: Pull Request resolved: facebookresearch/FBGEMM#789 X-link: pytorch#3707 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Reviewed By: q10 Differential Revision: D69573878 fbshipit-source-id: b3d5045cf718a13356c068846a7805ff60f61d87

facebook-github-bot added the cla signed label Feb 18, 2025

facebook-github-bot added the fb-exported label Feb 18, 2025

Nicoshev force-pushed the export-D69573878 branch from 99e2504 to 7d12c46 Compare February 18, 2025 19:32

Nicoshev force-pushed the export-D69573878 branch from 7d12c46 to 3bff7e9 Compare February 18, 2025 21:39

Nicoshev force-pushed the export-D69573878 branch from 3bff7e9 to a8931d3 Compare February 18, 2025 21:41

Nicoshev force-pushed the export-D69573878 branch from a8931d3 to 6fdcc64 Compare February 18, 2025 23:42

Nicoshev force-pushed the export-D69573878 branch from 6fdcc64 to 0556e72 Compare February 18, 2025 23:44

Nicoshev force-pushed the export-D69573878 branch from 0556e72 to 4757d4c Compare February 18, 2025 23:48

Nicoshev force-pushed the export-D69573878 branch from 4757d4c to 09852aa Compare February 19, 2025 20:09

Nicoshev force-pushed the export-D69573878 branch from 09852aa to 2c5f603 Compare February 24, 2025 18:42

Nicoshev force-pushed the export-D69573878 branch from 2c5f603 to 2ee6cda Compare February 24, 2025 19:12

Nicoshev force-pushed the export-D69573878 branch from 2ee6cda to cbbf748 Compare February 24, 2025 19:14

Nicoshev force-pushed the export-D69573878 branch from cbbf748 to dfeb094 Compare February 24, 2025 22:01

Nicoshev force-pushed the export-D69573878 branch from dfeb094 to 295e157 Compare February 25, 2025 21:56

facebook-github-bot closed this in 3de6774 Feb 27, 2025

facebook-github-bot added the Merged label Feb 27, 2025

q10 added category:new feature:gemm labels Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Uh oh!

Nicoshev commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

netlify bot commented Feb 18, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 19, 2025

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

facebook-github-bot commented Feb 25, 2025

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

Uh oh!

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Uh oh!

Conversation

Nicoshev commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

netlify bot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 18, 2025

Uh oh!

facebook-github-bot commented Feb 19, 2025

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

facebook-github-bot commented Feb 25, 2025

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

Uh oh!

netlify bot commented Feb 18, 2025 •

edited

Loading