Skip to content

Question: How do HWY_NATIVE_* macros behave under dynamic dispatch #2755

@pauxav06

Description

@pauxav06

Hello,

I am trying to port code that uses VNNI / Neon Dotprod, and I am having troubles detecting support for these operations with dynamic dispatch.

I modified skeleton.cc / skeleton.h by adding the following lines to CodepathDemo:

#ifdef HWY_NATIVE_U8_I8_SUMOFMULQUADACCUMULATE
  const char* gather2 = "Has VNNI";
#else
  const char* gather2 = "No VNNI";
#endif

#ifdef HWY_NATIVE_U8_I8_SATWIDENMULPAIRWISEADD
  const char* gather3 = "Has Fallback";
#else
  const char* gather3 = "No Fallback";
#endif

printf("Target %15s: %15s %15s %15s %d\n",
       hwy::TargetName(HWY_TARGET),
       gather, gather2, gather3,
       HWY_TARGET <= HWY_AVX3_DL);

Then I replaced dynamic dispatch with:

#define VISITOR(TARGET, NAMESPACE) NAMESPACE::CodepathDemo();
HWY_VISIT_TARGETS(VISITOR)

(I also tried a SetSupportedTargetsForTest + HWY_DYNAMIC_DISPATCH loop for the same exact results)

The programm was compiled on with gcc, with O3 and no march flags enabed .

The output shows unexpected toggling of the macros:

Target            AVX2:       Has int64        Has VNNI    Has Fallback 0
Target            AVX3:       Has int64         No VNNI     No Fallback 0
Target         AVX3_DL:       Has int64        Has VNNI    Has Fallback 1
Target        AVX3_SPR:       Has int64        Has VNNI    Has Fallback 1
Target       AVX3_ZEN4:       Has int64         No VNNI     No Fallback 1
Target            SSE2:       Has int64         No VNNI     No Fallback 0
Target            SSE2:       Has int64         No VNNI     No Fallback 0
Target            SSE4:       Has int64         No VNNI     No Fallback 0
Target           SSSE3:       Has int64        Has VNNI    Has Fallback 0

It seems that the HWY_NATIVE_* macros are toggled on/off at each iteration of the dynamic dispatch codegen process:

  1. SSE2 -> OFF
  2. SSSE3 -> ON
  3. SSE4 -> OFF
  4. AVX2 -> ON
  5. AVX3 -> OFF
  6. AVX3_DL -> ON
  7. AVX3_ZEN4 -> OFF
  8. AVX3_SPR -> ON

However, testing other macros like HWY_NATIVE_FMA which are defined in set_macros-inl.h works perfectly!

Questions:

  • Is this meant to work and did i mess something up, or are most HWY_NATIVE_* macros intended only for static dispatch?
  • If I want to detect VNNI / Dotprod support reliably in a dynamically dispatched function, should I :
    • Use something like HWY_TARGET <= HWY_AVX3_DL?
    • Build static targets separately, aggregate via CMake, and probe dispatch function once to get the correct function table?
    • Should i not be trying to manually route kernels and instead let auto-tune decide for its-self what to do. For instance generate 3/4 kernels per target, even the ones using non native operations that will be very slow, then probe the target function pointers at startup and tune to find the most efficient one.
    • Am i way off topic and this is not the way to go at all?

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions