Skip to content

portable DP4A function #16848

Open
Open
@jinz2014

Description

@jinz2014

Is your feature request related to a problem? Please describe

#3682

Describe the solution you would like

sycl::dp4a(a, b, c)

Describe alternatives you have considered

static __device__ __forceinline__ int __dp4a(const int a, const int b, int c) {
#if __has_builtin(__builtin_amdgcn_sdot4)
    c = __builtin_amdgcn_sdot4(a, b, c, false);
#else
    const int8x4_t va = reinterpret_cast<const int8x4_t&>(a);
    const int8x4_t vb = reinterpret_cast<const int8x4_t&>(b);
    c += va[0] * vb[0] + va[1] * vb[1] + va[2] * vb[2] + va[3] * vb[3];
#endif
    return c;
}

Additional context

https://developer.nvidia.com/blog/mixed-precision-programming-cuda-8/

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions