[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. #109078

vzakhari · 2024-09-18T02:45:37Z

std::complex operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of cuda::std::complex from libcudacxx.
cuda::std::complex does not have specializations for long double,
so the change is accompanied with a clean-up for long double usage.

`std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage.

llvmbot · 2024-09-18T02:46:10Z

@llvm/pr-subscribers-flang-runtime

Author: Slava Zakharin (vzakhari)

Changes

std::complex operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of cuda::std::complex from libcudacxx.
cuda::std::complex does not have specializations for long double,
so the change is accompanied with a clean-up for long double usage.

Patch is 93.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109078.diff

23 Files Affected:

(added) flang/include/flang/Common/float80.h (+43)
(added) flang/include/flang/Runtime/complex.h (+31)
(modified) flang/include/flang/Runtime/cpp-type.h (+5-4)
(modified) flang/include/flang/Runtime/matmul-instances.inc (+3-3)
(modified) flang/include/flang/Runtime/numeric.h (+16-16)
(modified) flang/include/flang/Runtime/reduce.h (+129-85)
(modified) flang/include/flang/Runtime/reduction.h (+60-52)
(modified) flang/include/flang/Runtime/transformational.h (+12-8)
(modified) flang/runtime/complex-powi.cpp (+22-17)
(modified) flang/runtime/complex-reduction.c (+4-4)
(modified) flang/runtime/dot-product.cpp (+7-14)
(modified) flang/runtime/extrema.cpp (+5-5)
(modified) flang/runtime/matmul-transpose.cpp (-17)
(modified) flang/runtime/matmul.cpp (+6-28)
(modified) flang/runtime/numeric.cpp (+18-18)
(modified) flang/runtime/product.cpp (+4-11)
(modified) flang/runtime/random.cpp (+1-1)
(modified) flang/runtime/reduce.cpp (+98-82)
(modified) flang/runtime/reduction-templates.h (+2-2)
(modified) flang/runtime/sum.cpp (+12-10)
(modified) flang/runtime/transformational.cpp (+4-4)
(modified) flang/unittests/Runtime/Numeric.cpp (+2-2)
(modified) flang/unittests/Runtime/Transformational.cpp (+5-5)

diff --git a/flang/include/flang/Common/float80.h b/flang/include/flang/Common/float80.h
new file mode 100644
index 00000000000000..1838f7b13c8bb2
--- /dev/null
+++ b/flang/include/flang/Common/float80.h
@@ -0,0 +1,43 @@
+/*===-- flang/Common/float80.h --------------------------------------*- C -*-===
+ *
+ * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+ * See https://llvm.org/LICENSE.txt for license information.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ *
+ *===----------------------------------------------------------------------===*/
+
+/* This header is usable in both C and C++ code.
+ * Isolates build compiler checks to determine if the 80-bit
+ * floating point format is supported via a particular C type.
+ * It defines CFloat80Type and CppFloat80Type aliases for this
+ * C type.
+ */
+
+#ifndef FORTRAN_COMMON_FLOAT80_H_
+#define FORTRAN_COMMON_FLOAT80_H_
+
+#include "api-attrs.h"
+#include <float.h>
+
+#if LDBL_MANT_DIG == 64
+#undef HAS_FLOAT80
+#define HAS_FLOAT80 1
+#endif
+
+#if defined(RT_DEVICE_COMPILATION) && defined(__CUDACC__)
+/*
+ * 'long double' is treated as 'double' in the CUDA device code,
+ * and there is no support for 80-bit floating point format.
+ * This is probably true for most offload devices, so RT_DEVICE_COMPILATION
+ * check should be enough. For the time being, guard it with __CUDACC__
+ * as well.
+ */
+#undef HAS_FLOAT80
+#endif
+
+#if HAS_FLOAT80
+typedef long double CFloat80Type;
+typedef long double CppFloat80Type;
+#endif
+
+#endif /* FORTRAN_COMMON_FLOAT80_H_ */
diff --git a/flang/include/flang/Runtime/complex.h b/flang/include/flang/Runtime/complex.h
new file mode 100644
index 00000000000000..b7ad1376bffbf1
--- /dev/null
+++ b/flang/include/flang/Runtime/complex.h
@@ -0,0 +1,31 @@
+//===-- include/flang/Runtime/complex.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// A single way to expose C++ complex class in files that can be used
+// in F18 runtime build. With inclusion of this file std::complex
+// and the related names become available, though, they may correspond
+// to alternative definitions (e.g. from cuda::std namespace).
+
+#ifndef FORTRAN_RUNTIME_COMPLEX_H
+#define FORTRAN_RUNTIME_COMPLEX_H
+
+#if RT_USE_LIBCUDACXX
+#include <cuda/std/complex>
+namespace Fortran::runtime::rtcmplx {
+using cuda::std::complex;
+using cuda::std::conj;
+} // namespace Fortran::runtime::rtcmplx
+#else // !RT_USE_LIBCUDACXX
+#include <complex>
+namespace Fortran::runtime::rtcmplx {
+using std::complex;
+using std::conj;
+} // namespace Fortran::runtime::rtcmplx
+#endif // !RT_USE_LIBCUDACXX
+
+#endif // FORTRAN_RUNTIME_COMPLEX_H
diff --git a/flang/include/flang/Runtime/cpp-type.h b/flang/include/flang/Runtime/cpp-type.h
index fe21dd544cf7d8..aef0fbd7ede586 100644
--- a/flang/include/flang/Runtime/cpp-type.h
+++ b/flang/include/flang/Runtime/cpp-type.h
@@ -13,8 +13,9 @@
 
 #include "flang/Common/Fortran.h"
 #include "flang/Common/float128.h"
+#include "flang/Common/float80.h"
 #include "flang/Common/uint128.h"
-#include <complex>
+#include "flang/Runtime/complex.h"
 #include <cstdint>
 #if __cplusplus >= 202302
 #include <stdfloat>
@@ -70,9 +71,9 @@ template <> struct CppTypeForHelper<TypeCategory::Real, 8> {
   using type = double;
 #endif
 };
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 template <> struct CppTypeForHelper<TypeCategory::Real, 10> {
-  using type = long double;
+  using type = CppFloat80Type;
 };
 #endif
 #if __STDCPP_FLOAT128_T__
@@ -89,7 +90,7 @@ template <> struct CppTypeForHelper<TypeCategory::Real, 16> {
 #endif
 
 template <int KIND> struct CppTypeForHelper<TypeCategory::Complex, KIND> {
-  using type = std::complex<CppTypeFor<TypeCategory::Real, KIND>>;
+  using type = rtcmplx::complex<CppTypeFor<TypeCategory::Real, KIND>>;
 };
 
 template <> struct CppTypeForHelper<TypeCategory::Character, 1> {
diff --git a/flang/include/flang/Runtime/matmul-instances.inc b/flang/include/flang/Runtime/matmul-instances.inc
index 32c6ab06d25219..88e3067ca029d4 100644
--- a/flang/include/flang/Runtime/matmul-instances.inc
+++ b/flang/include/flang/Runtime/matmul-instances.inc
@@ -111,7 +111,7 @@ FOREACH_MATMUL_TYPE_PAIR(MATMUL_DIRECT_INSTANCE)
 FOREACH_MATMUL_TYPE_PAIR_WITH_INT16(MATMUL_INSTANCE)
 FOREACH_MATMUL_TYPE_PAIR_WITH_INT16(MATMUL_DIRECT_INSTANCE)
 
-#if MATMUL_FORCE_ALL_TYPES || LDBL_MANT_DIG == 64
+#if MATMUL_FORCE_ALL_TYPES || HAS_FLOAT80
 MATMUL_INSTANCE(Integer, 16, Real, 10)
 MATMUL_INSTANCE(Integer, 16, Complex, 10)
 MATMUL_INSTANCE(Real, 10, Integer, 16)
@@ -133,7 +133,7 @@ MATMUL_DIRECT_INSTANCE(Complex, 16, Integer, 16)
 #endif
 #endif // MATMUL_FORCE_ALL_TYPES || (defined __SIZEOF_INT128__ && !AVOID_NATIVE_UINT128_T)
 
-#if MATMUL_FORCE_ALL_TYPES || LDBL_MANT_DIG == 64
+#if MATMUL_FORCE_ALL_TYPES || HAS_FLOAT80
 #define FOREACH_MATMUL_TYPE_PAIR_WITH_REAL10(macro)         \
   macro(Integer, 1, Real, 10)                               \
   macro(Integer, 1, Complex, 10)                            \
@@ -193,7 +193,7 @@ MATMUL_DIRECT_INSTANCE(Complex, 10, Complex, 16)
 MATMUL_DIRECT_INSTANCE(Complex, 16, Real, 10)
 MATMUL_DIRECT_INSTANCE(Complex, 16, Complex, 10)
 #endif
-#endif // MATMUL_FORCE_ALL_TYPES || LDBL_MANT_DIG == 64
+#endif // MATMUL_FORCE_ALL_TYPES || HAS_FLOAT80
 
 #if MATMUL_FORCE_ALL_TYPES || (LDBL_MANT_DIG == 113 || HAS_FLOAT128)
 #define FOREACH_MATMUL_TYPE_PAIR_WITH_REAL16(macro)         \
diff --git a/flang/include/flang/Runtime/numeric.h b/flang/include/flang/Runtime/numeric.h
index 84a5a7cd7a361c..c3923ee2e0d889 100644
--- a/flang/include/flang/Runtime/numeric.h
+++ b/flang/include/flang/Runtime/numeric.h
@@ -44,7 +44,7 @@ CppTypeFor<TypeCategory::Integer, 8> RTDECL(Ceiling8_8)(
 CppTypeFor<TypeCategory::Integer, 16> RTDECL(Ceiling8_16)(
     CppTypeFor<TypeCategory::Real, 8>);
 #endif
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 1> RTDECL(Ceiling10_1)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 2> RTDECL(Ceiling10_2)(
@@ -78,7 +78,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(ErfcScaled4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(ErfcScaled8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(ErfcScaled10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -96,7 +96,7 @@ CppTypeFor<TypeCategory::Integer, 4> RTDECL(Exponent8_4)(
     CppTypeFor<TypeCategory::Real, 8>);
 CppTypeFor<TypeCategory::Integer, 8> RTDECL(Exponent8_8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 4> RTDECL(Exponent10_4)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 8> RTDECL(Exponent10_8)(
@@ -134,7 +134,7 @@ CppTypeFor<TypeCategory::Integer, 8> RTDECL(Floor8_8)(
 CppTypeFor<TypeCategory::Integer, 16> RTDECL(Floor8_16)(
     CppTypeFor<TypeCategory::Real, 8>);
 #endif
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 1> RTDECL(Floor10_1)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 2> RTDECL(Floor10_2)(
@@ -168,7 +168,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Fraction4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Fraction8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Fraction10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -180,7 +180,7 @@ CppTypeFor<TypeCategory::Real, 16> RTDECL(Fraction16)(
 // ISNAN / IEEE_IS_NAN
 bool RTDECL(IsNaN4)(CppTypeFor<TypeCategory::Real, 4>);
 bool RTDECL(IsNaN8)(CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 bool RTDECL(IsNaN10)(CppTypeFor<TypeCategory::Real, 10>);
 #endif
 #if LDBL_MANT_DIG == 113 || HAS_FLOAT128
@@ -212,7 +212,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(ModReal4)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(ModReal8)(
     CppTypeFor<TypeCategory::Real, 8>, CppTypeFor<TypeCategory::Real, 8>,
     const char *sourceFile = nullptr, int sourceLine = 0);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(ModReal10)(
     CppTypeFor<TypeCategory::Real, 10>, CppTypeFor<TypeCategory::Real, 10>,
     const char *sourceFile = nullptr, int sourceLine = 0);
@@ -247,7 +247,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(ModuloReal4)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(ModuloReal8)(
     CppTypeFor<TypeCategory::Real, 8>, CppTypeFor<TypeCategory::Real, 8>,
     const char *sourceFile = nullptr, int sourceLine = 0);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(ModuloReal10)(
     CppTypeFor<TypeCategory::Real, 10>, CppTypeFor<TypeCategory::Real, 10>,
     const char *sourceFile = nullptr, int sourceLine = 0);
@@ -283,7 +283,7 @@ CppTypeFor<TypeCategory::Integer, 8> RTDECL(Nint8_8)(
 CppTypeFor<TypeCategory::Integer, 16> RTDECL(Nint8_16)(
     CppTypeFor<TypeCategory::Real, 8>);
 #endif
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 1> RTDECL(Nint10_1)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 2> RTDECL(Nint10_2)(
@@ -319,7 +319,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Nearest4)(
     CppTypeFor<TypeCategory::Real, 4>, bool positive);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Nearest8)(
     CppTypeFor<TypeCategory::Real, 8>, bool positive);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Nearest10)(
     CppTypeFor<TypeCategory::Real, 10>, bool positive);
 #endif
@@ -333,7 +333,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(RRSpacing4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(RRSpacing8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(RRSpacing10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -347,7 +347,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(SetExponent4)(
     CppTypeFor<TypeCategory::Real, 4>, std::int64_t);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(SetExponent8)(
     CppTypeFor<TypeCategory::Real, 8>, std::int64_t);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(SetExponent10)(
     CppTypeFor<TypeCategory::Real, 10>, std::int64_t);
 #endif
@@ -361,7 +361,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Scale4)(
     CppTypeFor<TypeCategory::Real, 4>, std::int64_t);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Scale8)(
     CppTypeFor<TypeCategory::Real, 8>, std::int64_t);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Scale10)(
     CppTypeFor<TypeCategory::Real, 10>, std::int64_t);
 #endif
@@ -410,7 +410,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Spacing4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Spacing8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Spacing10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -425,7 +425,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(FPow4i)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(FPow8i)(
     CppTypeFor<TypeCategory::Real, 8> b,
     CppTypeFor<TypeCategory::Integer, 4> e);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(FPow10i)(
     CppTypeFor<TypeCategory::Real, 10> b,
     CppTypeFor<TypeCategory::Integer, 4> e);
@@ -442,7 +442,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(FPow4k)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(FPow8k)(
     CppTypeFor<TypeCategory::Real, 8> b,
     CppTypeFor<TypeCategory::Integer, 8> e);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(FPow10k)(
     CppTypeFor<TypeCategory::Real, 10> b,
     CppTypeFor<TypeCategory::Integer, 8> e);
diff --git a/flang/include/flang/Runtime/reduce.h b/flang/include/flang/Runtime/reduce.h
index 60f54c393b4bbd..c016b37f9592a1 100644
--- a/flang/include/flang/Runtime/reduce.h
+++ b/flang/include/flang/Runtime/reduce.h
@@ -188,22 +188,26 @@ void RTDECL(ReduceReal8DimValue)(Descriptor &result, const Descriptor &array,
     ValueReductionOperation<double>, const char *source, int line, int dim,
     const Descriptor *mask = nullptr, const double *identity = nullptr,
     bool ordered = true);
-#if LDBL_MANT_DIG == 64
-long double RTDECL(ReduceReal10Ref)(const Descriptor &,
-    ReferenceReductionOperation<long double>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const long double *identity = nullptr, bool ordered = true);
-long double RTDECL(ReduceReal10Value)(const Descriptor &,
-    ValueReductionOperation<long double>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const long double *identity = nullptr, bool ordered = true);
+#if HAS_FLOAT80
+CppTypeFor<TypeCategory::Real, 10> RTDECL(ReduceReal10Ref)(const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
+    bool ordered = true);
+CppTypeFor<TypeCategory::Real, 10> RTDECL(ReduceReal10Value)(const Descriptor &,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(ReduceReal10DimRef)(Descriptor &result, const Descriptor &array,
-    ReferenceReductionOperation<long double>, const char *source, int line,
-    int dim, const Descriptor *mask = nullptr,
-    const long double *identity = nullptr, bool ordered = true);
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(ReduceReal10DimValue)(Descriptor &result, const Descriptor &array,
-    ValueReductionOperation<long double>, const char *source, int line, int dim,
-    const Descriptor *mask = nullptr, const long double *identity = nullptr,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
     bool ordered = true);
 #endif
 #if LDBL_MANT_DIG == 113 || HAS_FLOAT128
@@ -225,112 +229,152 @@ void RTDECL(ReduceReal16DimValue)(Descriptor &result, const Descriptor &array,
     const CppFloat128Type *identity = nullptr, bool ordered = true);
 #endif
 
-void RTDECL(CppReduceComplex2Ref)(std::complex<float> &, const Descriptor &,
-    ReferenceReductionOperation<std::complex<float>>, const char *source,
-    int line, int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex2Value)(std::complex<float> &, const Descriptor &,
-    ValueReductionOperation<std::complex<float>>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+void RTDECL(CppReduceComplex2Ref)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex2Value)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex2DimRef)(Descriptor &result,
-    const Descriptor &array, ReferenceReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex2DimValue)(Descriptor &result,
-    const Descriptor &array, ValueReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex3Ref)(std::complex<float> &, const Descriptor &,
-    ReferenceReductionOperation<std::complex<float>>, const char *source,
-    int line, int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex3Value)(std::complex<float> &, const Descriptor &,
-    ValueReductionOperation<std::complex<float>>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex3Ref)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex3Value)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex3DimRef)(Descriptor &result,
-    const Descriptor &array, ReferenceReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex3DimValue)(Descriptor &result,
-    const Descriptor &array, ValueReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex4Ref)(std::complex<float> &, const Descriptor &,
-    ReferenceReductionOperation<std::complex<float>>, const char *source,
-    int line, int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex4Value)(std::complex<float> &, const Descriptor &,
-    ValueReductionOperation<std::complex<float>>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex4Ref)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *...
[truncated]

cannot handle them.

jeanPerier

Looks great, thank you Slava!

llvm-ci · 2024-09-18T18:17:56Z

LLVM Buildbot has detected a new failure on builder flang-runtime-cuda-gcc running on as-builder-7 while building flang at step 5 "build-FortranRuntime".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/152/builds/399

Here is the relevant piece of the build log for the reference

Step 5 (build-FortranRuntime) failure: build (failure)
...
10.424 [1/27/38] Building CUDA object CMakeFiles/FortranRuntime.dir/array-constructor.cpp.o
10.548 [1/26/39] Building CUDA object CMakeFiles/FortranRuntime.dir/descriptor-io.cpp.o
10.563 [1/25/40] Building CUDA object CMakeFiles/FortranRuntime.dir/type-info.cpp.o
10.569 [1/24/41] Building CUDA object CMakeFiles/FortranRuntime.dir/inquiry.cpp.o
10.646 [1/23/42] Building CUDA object CMakeFiles/FortranRuntime.dir/pointer.cpp.o
10.973 [1/22/43] Building CUDA object CMakeFiles/FortranRuntime.dir/tools.cpp.o
10.996 [1/21/44] Building CUDA object CMakeFiles/FortranRuntime.dir/derived.cpp.o
11.238 [1/20/45] Building CUDA object CMakeFiles/FortranRuntime.dir/external-unit.cpp.o
11.455 [1/19/46] Building CUDA object CMakeFiles/FortranRuntime.dir/transformational.cpp.o
11.657 [1/18/47] Building CUDA object CMakeFiles/FortranRuntime.dir/product.cpp.o
FAILED: CMakeFiles/FortranRuntime.dir/product.cpp.o 
ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -DRT_USE_LIBCUDACXX=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/../include -I/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build -I/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include -G -g -O3 -DNDEBUG --generate-code=arch=compute_80,code=[compute_80,sm_80]   -U_GLIBCXX_ASSERTIONS -U_LIBCPP_ENABLE_ASSERTIONS -std=c++17  -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti --expt-relaxed-constexpr -Xcudafe --diag_suppress=20208 -Xcudafe --display_error_number -MD -MT CMakeFiles/FortranRuntime.dir/product.cpp.o -MF CMakeFiles/FortranRuntime.dir/product.cpp.o.d -x cu -dc /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp -o CMakeFiles/FortranRuntime.dir/product.cpp.o
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/limits: In instantiation of ‘class cuda::std::__4::numeric_limits<long double>’:
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:524:60:   required from ‘constexpr cuda::std::__4::complex<_Tp> cuda::std::__4::operator*(const cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Tp>&) [with _Tp = long double]’
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:433:16:   required from ‘constexpr cuda::std::__4::complex<_Tp>& cuda::std::__4::operator*=(cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Up>&) [with _Tp = long double; _Up = long double]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:52:1:   required from ‘bool Fortran::runtime::ComplexProductAccumulator<PART>::AccumulateAt(const SubscriptValue*) [with A = cuda::std::__4::complex<long double>; PART = long double; Fortran::runtime::SubscriptValue = long int]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:62:47:   required from ‘void Fortran::runtime::DoTotalReduction(const Fortran::runtime::Descriptor&, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&, const char*, Fortran::runtime::Terminator&) [with TYPE = cuda::std::__4::complex<long double>; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:88:44:   required from ‘Fortran::runtime::CppTypeFor<CAT, KIND> Fortran::runtime::GetTotalReduction(const Fortran::runtime::Descriptor&, const char*, int, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&&, const char*) [with Fortran::common::TypeCategory CAT = Fortran::common::TypeCategory::Complex; int KIND = 10; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>; Fortran::runtime::CppTypeFor<CAT, KIND> = cuda::std::__4::complex<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:147:64:   required from here
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/limits:477:77: error: no type named ‘type’ in ‘class cuda::std::__4::__libcpp_numeric_limits<long double, true>’
  477 |     typedef typename __base::type type;
      |                                                                             ^   
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex: In instantiation of ‘constexpr cuda::std::__4::complex<_Tp> cuda::std::__4::operator*(const cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Tp>&) [with _Tp = long double]’:
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:433:16:   required from ‘constexpr cuda::std::__4::complex<_Tp>& cuda::std::__4::operator*=(cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Up>&) [with _Tp = long double; _Up = long double]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:52:1:   required from ‘bool Fortran::runtime::ComplexProductAccumulator<PART>::AccumulateAt(const SubscriptValue*) [with A = cuda::std::__4::complex<long double>; PART = long double; Fortran::runtime::SubscriptValue = long int]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:62:47:   required from ‘void Fortran::runtime::DoTotalReduction(const Fortran::runtime::Descriptor&, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&, const char*, Fortran::runtime::Terminator&) [with TYPE = cuda::std::__4::complex<long double>; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:88:44:   required from ‘Fortran::runtime::CppTypeFor<CAT, KIND> Fortran::runtime::GetTotalReduction(const Fortran::runtime::Descriptor&, const char*, int, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&&, const char*) [with Fortran::common::TypeCategory CAT = Fortran::common::TypeCategory::Complex; int KIND = 10; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>; Fortran::runtime::CppTypeFor<CAT, KIND> = cuda::std::__4::complex<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:147:64:   required from here
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:524:60: error: ‘quiet_NaN’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  524 |       return complex<_Tp>(_Tp(numeric_limits<_Tp>::quiet_NaN()), _Tp(0));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:530:60: error: ‘quiet_NaN’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  530 |         return complex<_Tp>(_Tp(numeric_limits<_Tp>::quiet_NaN()), _Tp(0));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:532:59: error: ‘infinity’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  532 |       return complex<_Tp>(_Tp(numeric_limits<_Tp>::infinity()), _Tp(numeric_limits<_Tp>::infinity()));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:540:60: error: ‘quiet_NaN’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  540 |       return complex<_Tp>(_Tp(numeric_limits<_Tp>::quiet_NaN()), _Tp(0));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/cmath: In instantiation of ‘constexpr cuda::std::__4::__enable_if_t<(! cuda::std::__4::is_floating_point<_Tp>::value), bool> cuda::std::__4::__constexpr_isinf(_A1) [with _A1 = __float128; cuda::std::__4::__enable_if_t<(! cuda::std::__4::is_floating_point<_Tp>::value), bool> = bool]’:
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:512:38:   required from ‘constexpr cuda::std::__4::complex<_Tp> cuda::std::__4::operator*(const cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Tp>&) [with _Tp = __float128]’
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:433:16:   required from ‘constexpr cuda::std::__4::complex<_Tp>& cuda::std::__4::operator*=(cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Up>&) [with _Tp = __float128; _Up = __float128]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:52:1:   required from ‘bool Fortran::runtime::ComplexProductAccumulator<PART>::AccumulateAt(const SubscriptValue*) [with A = cuda::std::__4::complex<__float128>; PART = __float128; Fortran::runtime::SubscriptValue = long int]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:62:47:   required from ‘void Fortran::runtime::DoTotalReduction(const Fortran::runtime::Descriptor&, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&, const char*, Fortran::runtime::Terminator&) [with TYPE = cuda::std::__4::complex<__float128>; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<__float128>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:88:44:   required from ‘Fortran::runtime::CppTypeFor<CAT, KIND> Fortran::runtime::GetTotalReduction(const Fortran::runtime::Descriptor&, const char*, int, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&&, const char*) [with Fortran::common::TypeCategory CAT = Fortran::common::TypeCategory::Complex; int KIND = 16; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<__float128>; Fortran::runtime::CppTypeFor<CAT, KIND> = cuda::std::__4::complex<__float128>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:156:64:   required from here
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/cmath:666:15: error: call of overloaded ‘isinf(__float128&)’ is ambiguous
  666 |     return ::isinf(__lcpp_x);

…uild." (#109173) Reverts #109078

…uild. (llvm#109078)" `std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage. Additional change on top of llvm#109078 is to use `cuda::std::complex` only for the device compilation, otherwise the host compilation fails because `libcudacxx` may not support `long double` specialization at all (depending on the compiler).

…uild. (#109078)" (#109207) `std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage. Additional change on top of #109078 is to use `cuda::std::complex` only for the device compilation, otherwise the host compilation fails because `libcudacxx` may not support `long double` specialization at all (depending on the compiler).

…lvm#109078) `std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage.

…uild." (llvm#109173) Reverts llvm#109078

…uild. (llvm#109078)" (llvm#109207) `std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage. Additional change on top of llvm#109078 is to use `cuda::std::complex` only for the device compilation, otherwise the host compilation fails because `libcudacxx` may not support `long double` specialization at all (depending on the compiler).

vzakhari requested review from jeanPerier and klausler September 18, 2024 02:45

llvmbot added flang:runtime flang Flang issues not falling into any other category labels Sep 18, 2024

Partially reverted changes in complex-powi.cpp, because clang

618c9f1

cannot handle them.

jeanPerier approved these changes Sep 18, 2024

View reviewed changes

klausler approved these changes Sep 18, 2024

View reviewed changes

vzakhari merged commit be187a6 into llvm:main Sep 18, 2024
8 checks passed

vzakhari mentioned this pull request Sep 18, 2024

Revert "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build." #109173

Merged

vzakhari added a commit that referenced this pull request Sep 18, 2024

Revert "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA b…

36192fd

…uild." (#109173) Reverts #109078

vzakhari mentioned this pull request Sep 18, 2024

Reland "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)" #109207

Merged

tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024

Revert "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA b…

bd4f8d9

…uild." (llvm#109173) Reverts llvm#109078

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. #109078

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. #109078

vzakhari commented Sep 18, 2024

llvmbot commented Sep 18, 2024

jeanPerier left a comment

llvm-ci commented Sep 18, 2024

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. #109078

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. #109078

Conversation

vzakhari commented Sep 18, 2024

llvmbot commented Sep 18, 2024

jeanPerier left a comment

Choose a reason for hiding this comment

llvm-ci commented Sep 18, 2024