ggml-quants: use _mm256_testz_si256 for mask checks in AVX2 #17641

GermanAizek · 2025-12-01T08:57:41Z

_mm256_testz_si256 directly checks if all bits of a vector are zero, which is a more efficient approach for conditional branching than extracting an 8-bit mask and then checking if the mask is non-zero. This optimization leverages specific AVX2 instruction capabilities, potentially reducing instruction latency and improving overall performance by avoiding unnecessary register transfers for the mask.

References:

When to use _mm256_testz_si256 vs _mm256_movemask_epi8:
https://stackoverflow.com/questions/27643534/when-to-use-mm256-testz-si256-vs-mm256-movemask-epi8
AVX2: _mm256_testz_si256 vs _mm256_cmpeq_epi32 and _mm256_movemask_epi8:
https://stackoverflow.com/questions/43206253/avx2-mm256-testz-si256-vs-mm256-cmpeq-epi32-and-mm256-movemask-epi8
Intel Intrinsics Guide for _mm256_testz_si256:
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_testz_si256
Intel Intrinsics Guide for _mm256_movemask_epi8:
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_movemask_epi8
Efficiently checking for zero vectors with AVX2:
https://lemire.me/blog/2018/06/18/efficiently-checking-for-zero-vectors-with-avx2/

Co-Authored-By: Gemini 2.5 Pro (References and description commit changes)

`_mm256_testz_si256` directly checks if all bits of a vector are zero, which is a more efficient approach for conditional branching than extracting an 8-bit mask and then checking if the mask is non-zero. This optimization leverages specific AVX2 instruction capabilities, potentially reducing instruction latency and improving overall performance by avoiding unnecessary register transfers for the mask. References: * When to use _mm256_testz_si256 vs _mm256_movemask_epi8: [https://stackoverflow.com/questions/27643534/when-to-use-mm256-testz-si256-vs-mm256-movemask-epi8](https://stackoverflow.com/questions/27643534/when-to-use-mm256-testz-si256-vs-mm256-movemask-epi8) * AVX2: _mm256_testz_si256 vs _mm256_cmpeq_epi32 and _mm256_movemask_epi8: [https://stackoverflow.com/questions/43206253/avx2-mm256-testz-si256-vs-mm256-cmpeq-epi32-and-mm256-movemask-epi8](https://stackoverflow.com/questions/43206253/avx2-mm256-testz-si256-vs-mm256-cmpeq-epi32-and-mm256-movemask-epi8) * Intel Intrinsics Guide for `_mm256_testz_si256`: [https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_testz_si256](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_testz_si256) * Intel Intrinsics Guide for `_mm256_movemask_epi8`: [https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_movemask_epi8](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_movemask_epi8) * Efficiently checking for zero vectors with AVX2: [https://lemire.me/blog/2018/06/18/efficiently-checking-for-zero-vectors-with-avx2/](https://lemire.me/blog/2018/06/18/efficiently-checking-for-zero-vectors-with-avx2/)

GermanAizek requested a review from ggerganov as a code owner December 1, 2025 08:57

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 1, 2025

pwilkin added the vibe-coded Created with heavy use of LLM assistants, requires human verification label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-quants: use _mm256_testz_si256 for mask checks in AVX2 #17641

ggml-quants: use _mm256_testz_si256 for mask checks in AVX2 #17641

Uh oh!

GermanAizek commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-quants: use _mm256_testz_si256 for mask checks in AVX2 #17641

Are you sure you want to change the base?

ggml-quants: use _mm256_testz_si256 for mask checks in AVX2 #17641

Uh oh!

Conversation

GermanAizek commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GermanAizek commented Dec 1, 2025 •

edited

Loading