Use better exponent rounding in Triton MX4 quantize kernel #2816

jwfromm · 2024-07-10T15:57:16Z

Summary:
As noted in this doc, using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding.

Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again.

Reviewed By: jianyuh

Differential Revision: D59527463

netlify · 2024-07-10T15:57:33Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`3cbb3e9`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66918d3d5081e500084a5375
😎 Deploy Preview	https://deploy-preview-2816--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-07-10T15:57:35Z

This pull request was exported from Phabricator. Differential Revision: D59527463

facebook-github-bot · 2024-07-10T17:44:56Z

This pull request was exported from Phabricator. Differential Revision: D59527463

) Summary: X-link: facebookresearch/FBGEMM#20 Pull Request resolved: pytorch#2816 As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding. Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again. Reviewed By: jianyuh Differential Revision: D59527463

facebook-github-bot · 2024-07-10T17:54:04Z

This pull request was exported from Phabricator. Differential Revision: D59527463

) Summary: X-link: facebookresearch/FBGEMM#20 Pull Request resolved: pytorch#2816 As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding. Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again. Reviewed By: jianyuh Differential Revision: D59527463

) Summary: X-link: facebookresearch/FBGEMM#20 Pull Request resolved: pytorch#2816 As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding. Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again. Differential Revision: D59527463 Reviewed By: jianyuh

) Summary: X-link: facebookresearch/FBGEMM#20 Pull Request resolved: pytorch#2816 As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding. Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again. Reviewed By: jianyuh Differential Revision: D59527463

facebook-github-bot · 2024-07-12T20:08:26Z

This pull request was exported from Phabricator. Differential Revision: D59527463

) Summary: X-link: facebookresearch/FBGEMM#20 Pull Request resolved: pytorch#2816 As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding. Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again. Differential Revision: D59527463 Reviewed By: jianyuh

facebook-github-bot · 2024-07-14T21:04:36Z

This pull request has been merged in bc78e2e.

facebook-github-bot added the cla signed label Jul 10, 2024

facebook-github-bot added the fb-exported label Jul 10, 2024

jwfromm force-pushed the export-D59527463 branch from ad30615 to 5aab638 Compare July 10, 2024 17:45

jwfromm force-pushed the export-D59527463 branch from 5aab638 to 69a1928 Compare July 10, 2024 17:54

jwfromm force-pushed the export-D59527463 branch from 69a1928 to 3cbb3e9 Compare July 12, 2024 20:08

facebook-github-bot closed this in bc78e2e Jul 14, 2024

facebook-github-bot added the Merged label Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use better exponent rounding in Triton MX4 quantize kernel #2816

Use better exponent rounding in Triton MX4 quantize kernel #2816

Uh oh!

jwfromm commented Jul 10, 2024

Uh oh!

netlify bot commented Jul 10, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 12, 2024

Uh oh!

facebook-github-bot commented Jul 14, 2024

Uh oh!

Uh oh!

Use better exponent rounding in Triton MX4 quantize kernel #2816

Use better exponent rounding in Triton MX4 quantize kernel #2816

Uh oh!

Conversation

jwfromm commented Jul 10, 2024

Uh oh!

netlify bot commented Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 12, 2024

Uh oh!

facebook-github-bot commented Jul 14, 2024

Uh oh!

Uh oh!

netlify bot commented Jul 10, 2024 •

edited

Loading