Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. #109078

Merged
merged 2 commits into from
Sep 18, 2024

Conversation

vzakhari
Copy link
Contributor

std::complex operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of cuda::std::complex from libcudacxx.
cuda::std::complex does not have specializations for long double,
so the change is accompanied with a clean-up for long double usage.

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category labels Sep 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-flang-runtime

Author: Slava Zakharin (vzakhari)

Changes

std::complex operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of cuda::std::complex from libcudacxx.
cuda::std::complex does not have specializations for long double,
so the change is accompanied with a clean-up for long double usage.


Patch is 93.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109078.diff

23 Files Affected:

  • (added) flang/include/flang/Common/float80.h (+43)
  • (added) flang/include/flang/Runtime/complex.h (+31)
  • (modified) flang/include/flang/Runtime/cpp-type.h (+5-4)
  • (modified) flang/include/flang/Runtime/matmul-instances.inc (+3-3)
  • (modified) flang/include/flang/Runtime/numeric.h (+16-16)
  • (modified) flang/include/flang/Runtime/reduce.h (+129-85)
  • (modified) flang/include/flang/Runtime/reduction.h (+60-52)
  • (modified) flang/include/flang/Runtime/transformational.h (+12-8)
  • (modified) flang/runtime/complex-powi.cpp (+22-17)
  • (modified) flang/runtime/complex-reduction.c (+4-4)
  • (modified) flang/runtime/dot-product.cpp (+7-14)
  • (modified) flang/runtime/extrema.cpp (+5-5)
  • (modified) flang/runtime/matmul-transpose.cpp (-17)
  • (modified) flang/runtime/matmul.cpp (+6-28)
  • (modified) flang/runtime/numeric.cpp (+18-18)
  • (modified) flang/runtime/product.cpp (+4-11)
  • (modified) flang/runtime/random.cpp (+1-1)
  • (modified) flang/runtime/reduce.cpp (+98-82)
  • (modified) flang/runtime/reduction-templates.h (+2-2)
  • (modified) flang/runtime/sum.cpp (+12-10)
  • (modified) flang/runtime/transformational.cpp (+4-4)
  • (modified) flang/unittests/Runtime/Numeric.cpp (+2-2)
  • (modified) flang/unittests/Runtime/Transformational.cpp (+5-5)
diff --git a/flang/include/flang/Common/float80.h b/flang/include/flang/Common/float80.h
new file mode 100644
index 00000000000000..1838f7b13c8bb2
--- /dev/null
+++ b/flang/include/flang/Common/float80.h
@@ -0,0 +1,43 @@
+/*===-- flang/Common/float80.h --------------------------------------*- C -*-===
+ *
+ * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+ * See https://llvm.org/LICENSE.txt for license information.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ *
+ *===----------------------------------------------------------------------===*/
+
+/* This header is usable in both C and C++ code.
+ * Isolates build compiler checks to determine if the 80-bit
+ * floating point format is supported via a particular C type.
+ * It defines CFloat80Type and CppFloat80Type aliases for this
+ * C type.
+ */
+
+#ifndef FORTRAN_COMMON_FLOAT80_H_
+#define FORTRAN_COMMON_FLOAT80_H_
+
+#include "api-attrs.h"
+#include <float.h>
+
+#if LDBL_MANT_DIG == 64
+#undef HAS_FLOAT80
+#define HAS_FLOAT80 1
+#endif
+
+#if defined(RT_DEVICE_COMPILATION) && defined(__CUDACC__)
+/*
+ * 'long double' is treated as 'double' in the CUDA device code,
+ * and there is no support for 80-bit floating point format.
+ * This is probably true for most offload devices, so RT_DEVICE_COMPILATION
+ * check should be enough. For the time being, guard it with __CUDACC__
+ * as well.
+ */
+#undef HAS_FLOAT80
+#endif
+
+#if HAS_FLOAT80
+typedef long double CFloat80Type;
+typedef long double CppFloat80Type;
+#endif
+
+#endif /* FORTRAN_COMMON_FLOAT80_H_ */
diff --git a/flang/include/flang/Runtime/complex.h b/flang/include/flang/Runtime/complex.h
new file mode 100644
index 00000000000000..b7ad1376bffbf1
--- /dev/null
+++ b/flang/include/flang/Runtime/complex.h
@@ -0,0 +1,31 @@
+//===-- include/flang/Runtime/complex.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// A single way to expose C++ complex class in files that can be used
+// in F18 runtime build. With inclusion of this file std::complex
+// and the related names become available, though, they may correspond
+// to alternative definitions (e.g. from cuda::std namespace).
+
+#ifndef FORTRAN_RUNTIME_COMPLEX_H
+#define FORTRAN_RUNTIME_COMPLEX_H
+
+#if RT_USE_LIBCUDACXX
+#include <cuda/std/complex>
+namespace Fortran::runtime::rtcmplx {
+using cuda::std::complex;
+using cuda::std::conj;
+} // namespace Fortran::runtime::rtcmplx
+#else // !RT_USE_LIBCUDACXX
+#include <complex>
+namespace Fortran::runtime::rtcmplx {
+using std::complex;
+using std::conj;
+} // namespace Fortran::runtime::rtcmplx
+#endif // !RT_USE_LIBCUDACXX
+
+#endif // FORTRAN_RUNTIME_COMPLEX_H
diff --git a/flang/include/flang/Runtime/cpp-type.h b/flang/include/flang/Runtime/cpp-type.h
index fe21dd544cf7d8..aef0fbd7ede586 100644
--- a/flang/include/flang/Runtime/cpp-type.h
+++ b/flang/include/flang/Runtime/cpp-type.h
@@ -13,8 +13,9 @@
 
 #include "flang/Common/Fortran.h"
 #include "flang/Common/float128.h"
+#include "flang/Common/float80.h"
 #include "flang/Common/uint128.h"
-#include <complex>
+#include "flang/Runtime/complex.h"
 #include <cstdint>
 #if __cplusplus >= 202302
 #include <stdfloat>
@@ -70,9 +71,9 @@ template <> struct CppTypeForHelper<TypeCategory::Real, 8> {
   using type = double;
 #endif
 };
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 template <> struct CppTypeForHelper<TypeCategory::Real, 10> {
-  using type = long double;
+  using type = CppFloat80Type;
 };
 #endif
 #if __STDCPP_FLOAT128_T__
@@ -89,7 +90,7 @@ template <> struct CppTypeForHelper<TypeCategory::Real, 16> {
 #endif
 
 template <int KIND> struct CppTypeForHelper<TypeCategory::Complex, KIND> {
-  using type = std::complex<CppTypeFor<TypeCategory::Real, KIND>>;
+  using type = rtcmplx::complex<CppTypeFor<TypeCategory::Real, KIND>>;
 };
 
 template <> struct CppTypeForHelper<TypeCategory::Character, 1> {
diff --git a/flang/include/flang/Runtime/matmul-instances.inc b/flang/include/flang/Runtime/matmul-instances.inc
index 32c6ab06d25219..88e3067ca029d4 100644
--- a/flang/include/flang/Runtime/matmul-instances.inc
+++ b/flang/include/flang/Runtime/matmul-instances.inc
@@ -111,7 +111,7 @@ FOREACH_MATMUL_TYPE_PAIR(MATMUL_DIRECT_INSTANCE)
 FOREACH_MATMUL_TYPE_PAIR_WITH_INT16(MATMUL_INSTANCE)
 FOREACH_MATMUL_TYPE_PAIR_WITH_INT16(MATMUL_DIRECT_INSTANCE)
 
-#if MATMUL_FORCE_ALL_TYPES || LDBL_MANT_DIG == 64
+#if MATMUL_FORCE_ALL_TYPES || HAS_FLOAT80
 MATMUL_INSTANCE(Integer, 16, Real, 10)
 MATMUL_INSTANCE(Integer, 16, Complex, 10)
 MATMUL_INSTANCE(Real, 10, Integer, 16)
@@ -133,7 +133,7 @@ MATMUL_DIRECT_INSTANCE(Complex, 16, Integer, 16)
 #endif
 #endif // MATMUL_FORCE_ALL_TYPES || (defined __SIZEOF_INT128__ && !AVOID_NATIVE_UINT128_T)
 
-#if MATMUL_FORCE_ALL_TYPES || LDBL_MANT_DIG == 64
+#if MATMUL_FORCE_ALL_TYPES || HAS_FLOAT80
 #define FOREACH_MATMUL_TYPE_PAIR_WITH_REAL10(macro)         \
   macro(Integer, 1, Real, 10)                               \
   macro(Integer, 1, Complex, 10)                            \
@@ -193,7 +193,7 @@ MATMUL_DIRECT_INSTANCE(Complex, 10, Complex, 16)
 MATMUL_DIRECT_INSTANCE(Complex, 16, Real, 10)
 MATMUL_DIRECT_INSTANCE(Complex, 16, Complex, 10)
 #endif
-#endif // MATMUL_FORCE_ALL_TYPES || LDBL_MANT_DIG == 64
+#endif // MATMUL_FORCE_ALL_TYPES || HAS_FLOAT80
 
 #if MATMUL_FORCE_ALL_TYPES || (LDBL_MANT_DIG == 113 || HAS_FLOAT128)
 #define FOREACH_MATMUL_TYPE_PAIR_WITH_REAL16(macro)         \
diff --git a/flang/include/flang/Runtime/numeric.h b/flang/include/flang/Runtime/numeric.h
index 84a5a7cd7a361c..c3923ee2e0d889 100644
--- a/flang/include/flang/Runtime/numeric.h
+++ b/flang/include/flang/Runtime/numeric.h
@@ -44,7 +44,7 @@ CppTypeFor<TypeCategory::Integer, 8> RTDECL(Ceiling8_8)(
 CppTypeFor<TypeCategory::Integer, 16> RTDECL(Ceiling8_16)(
     CppTypeFor<TypeCategory::Real, 8>);
 #endif
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 1> RTDECL(Ceiling10_1)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 2> RTDECL(Ceiling10_2)(
@@ -78,7 +78,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(ErfcScaled4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(ErfcScaled8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(ErfcScaled10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -96,7 +96,7 @@ CppTypeFor<TypeCategory::Integer, 4> RTDECL(Exponent8_4)(
     CppTypeFor<TypeCategory::Real, 8>);
 CppTypeFor<TypeCategory::Integer, 8> RTDECL(Exponent8_8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 4> RTDECL(Exponent10_4)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 8> RTDECL(Exponent10_8)(
@@ -134,7 +134,7 @@ CppTypeFor<TypeCategory::Integer, 8> RTDECL(Floor8_8)(
 CppTypeFor<TypeCategory::Integer, 16> RTDECL(Floor8_16)(
     CppTypeFor<TypeCategory::Real, 8>);
 #endif
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 1> RTDECL(Floor10_1)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 2> RTDECL(Floor10_2)(
@@ -168,7 +168,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Fraction4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Fraction8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Fraction10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -180,7 +180,7 @@ CppTypeFor<TypeCategory::Real, 16> RTDECL(Fraction16)(
 // ISNAN / IEEE_IS_NAN
 bool RTDECL(IsNaN4)(CppTypeFor<TypeCategory::Real, 4>);
 bool RTDECL(IsNaN8)(CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 bool RTDECL(IsNaN10)(CppTypeFor<TypeCategory::Real, 10>);
 #endif
 #if LDBL_MANT_DIG == 113 || HAS_FLOAT128
@@ -212,7 +212,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(ModReal4)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(ModReal8)(
     CppTypeFor<TypeCategory::Real, 8>, CppTypeFor<TypeCategory::Real, 8>,
     const char *sourceFile = nullptr, int sourceLine = 0);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(ModReal10)(
     CppTypeFor<TypeCategory::Real, 10>, CppTypeFor<TypeCategory::Real, 10>,
     const char *sourceFile = nullptr, int sourceLine = 0);
@@ -247,7 +247,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(ModuloReal4)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(ModuloReal8)(
     CppTypeFor<TypeCategory::Real, 8>, CppTypeFor<TypeCategory::Real, 8>,
     const char *sourceFile = nullptr, int sourceLine = 0);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(ModuloReal10)(
     CppTypeFor<TypeCategory::Real, 10>, CppTypeFor<TypeCategory::Real, 10>,
     const char *sourceFile = nullptr, int sourceLine = 0);
@@ -283,7 +283,7 @@ CppTypeFor<TypeCategory::Integer, 8> RTDECL(Nint8_8)(
 CppTypeFor<TypeCategory::Integer, 16> RTDECL(Nint8_16)(
     CppTypeFor<TypeCategory::Real, 8>);
 #endif
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Integer, 1> RTDECL(Nint10_1)(
     CppTypeFor<TypeCategory::Real, 10>);
 CppTypeFor<TypeCategory::Integer, 2> RTDECL(Nint10_2)(
@@ -319,7 +319,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Nearest4)(
     CppTypeFor<TypeCategory::Real, 4>, bool positive);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Nearest8)(
     CppTypeFor<TypeCategory::Real, 8>, bool positive);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Nearest10)(
     CppTypeFor<TypeCategory::Real, 10>, bool positive);
 #endif
@@ -333,7 +333,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(RRSpacing4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(RRSpacing8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(RRSpacing10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -347,7 +347,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(SetExponent4)(
     CppTypeFor<TypeCategory::Real, 4>, std::int64_t);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(SetExponent8)(
     CppTypeFor<TypeCategory::Real, 8>, std::int64_t);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(SetExponent10)(
     CppTypeFor<TypeCategory::Real, 10>, std::int64_t);
 #endif
@@ -361,7 +361,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Scale4)(
     CppTypeFor<TypeCategory::Real, 4>, std::int64_t);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Scale8)(
     CppTypeFor<TypeCategory::Real, 8>, std::int64_t);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Scale10)(
     CppTypeFor<TypeCategory::Real, 10>, std::int64_t);
 #endif
@@ -410,7 +410,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(Spacing4)(
     CppTypeFor<TypeCategory::Real, 4>);
 CppTypeFor<TypeCategory::Real, 8> RTDECL(Spacing8)(
     CppTypeFor<TypeCategory::Real, 8>);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(Spacing10)(
     CppTypeFor<TypeCategory::Real, 10>);
 #endif
@@ -425,7 +425,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(FPow4i)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(FPow8i)(
     CppTypeFor<TypeCategory::Real, 8> b,
     CppTypeFor<TypeCategory::Integer, 4> e);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(FPow10i)(
     CppTypeFor<TypeCategory::Real, 10> b,
     CppTypeFor<TypeCategory::Integer, 4> e);
@@ -442,7 +442,7 @@ CppTypeFor<TypeCategory::Real, 4> RTDECL(FPow4k)(
 CppTypeFor<TypeCategory::Real, 8> RTDECL(FPow8k)(
     CppTypeFor<TypeCategory::Real, 8> b,
     CppTypeFor<TypeCategory::Integer, 8> e);
-#if LDBL_MANT_DIG == 64
+#if HAS_FLOAT80
 CppTypeFor<TypeCategory::Real, 10> RTDECL(FPow10k)(
     CppTypeFor<TypeCategory::Real, 10> b,
     CppTypeFor<TypeCategory::Integer, 8> e);
diff --git a/flang/include/flang/Runtime/reduce.h b/flang/include/flang/Runtime/reduce.h
index 60f54c393b4bbd..c016b37f9592a1 100644
--- a/flang/include/flang/Runtime/reduce.h
+++ b/flang/include/flang/Runtime/reduce.h
@@ -188,22 +188,26 @@ void RTDECL(ReduceReal8DimValue)(Descriptor &result, const Descriptor &array,
     ValueReductionOperation<double>, const char *source, int line, int dim,
     const Descriptor *mask = nullptr, const double *identity = nullptr,
     bool ordered = true);
-#if LDBL_MANT_DIG == 64
-long double RTDECL(ReduceReal10Ref)(const Descriptor &,
-    ReferenceReductionOperation<long double>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const long double *identity = nullptr, bool ordered = true);
-long double RTDECL(ReduceReal10Value)(const Descriptor &,
-    ValueReductionOperation<long double>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const long double *identity = nullptr, bool ordered = true);
+#if HAS_FLOAT80
+CppTypeFor<TypeCategory::Real, 10> RTDECL(ReduceReal10Ref)(const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
+    bool ordered = true);
+CppTypeFor<TypeCategory::Real, 10> RTDECL(ReduceReal10Value)(const Descriptor &,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(ReduceReal10DimRef)(Descriptor &result, const Descriptor &array,
-    ReferenceReductionOperation<long double>, const char *source, int line,
-    int dim, const Descriptor *mask = nullptr,
-    const long double *identity = nullptr, bool ordered = true);
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(ReduceReal10DimValue)(Descriptor &result, const Descriptor &array,
-    ValueReductionOperation<long double>, const char *source, int line, int dim,
-    const Descriptor *mask = nullptr, const long double *identity = nullptr,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Real, 10>>,
+    const char *source, int line, int dim, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Real, 10> *identity = nullptr,
     bool ordered = true);
 #endif
 #if LDBL_MANT_DIG == 113 || HAS_FLOAT128
@@ -225,112 +229,152 @@ void RTDECL(ReduceReal16DimValue)(Descriptor &result, const Descriptor &array,
     const CppFloat128Type *identity = nullptr, bool ordered = true);
 #endif
 
-void RTDECL(CppReduceComplex2Ref)(std::complex<float> &, const Descriptor &,
-    ReferenceReductionOperation<std::complex<float>>, const char *source,
-    int line, int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex2Value)(std::complex<float> &, const Descriptor &,
-    ValueReductionOperation<std::complex<float>>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+void RTDECL(CppReduceComplex2Ref)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex2Value)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex2DimRef)(Descriptor &result,
-    const Descriptor &array, ReferenceReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex2DimValue)(Descriptor &result,
-    const Descriptor &array, ValueReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex3Ref)(std::complex<float> &, const Descriptor &,
-    ReferenceReductionOperation<std::complex<float>>, const char *source,
-    int line, int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex3Value)(std::complex<float> &, const Descriptor &,
-    ValueReductionOperation<std::complex<float>>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex3Ref)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex3Value)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *mask = nullptr,
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex3DimRef)(Descriptor &result,
-    const Descriptor &array, ReferenceReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
 void RTDECL(CppReduceComplex3DimValue)(Descriptor &result,
-    const Descriptor &array, ValueReductionOperation<std::complex<float>>,
+    const Descriptor &array,
+    ValueReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
     const char *source, int line, int dim, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex4Ref)(std::complex<float> &, const Descriptor &,
-    ReferenceReductionOperation<std::complex<float>>, const char *source,
-    int line, int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
-void RTDECL(CppReduceComplex4Value)(std::complex<float> &, const Descriptor &,
-    ValueReductionOperation<std::complex<float>>, const char *source, int line,
-    int dim = 0, const Descriptor *mask = nullptr,
-    const std::complex<float> *identity = nullptr, bool ordered = true);
+    const CppTypeFor<TypeCategory::Complex, 4> *identity = nullptr,
+    bool ordered = true);
+void RTDECL(CppReduceComplex4Ref)(CppTypeFor<TypeCategory::Complex, 4> &,
+    const Descriptor &,
+    ReferenceReductionOperation<CppTypeFor<TypeCategory::Complex, 4>>,
+    const char *source, int line, int dim = 0, const Descriptor *...
[truncated]

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you Slava!

@vzakhari vzakhari merged commit be187a6 into llvm:main Sep 18, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 18, 2024

LLVM Buildbot has detected a new failure on builder flang-runtime-cuda-gcc running on as-builder-7 while building flang at step 5 "build-FortranRuntime".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/152/builds/399

Here is the relevant piece of the build log for the reference
Step 5 (build-FortranRuntime) failure: build (failure)
...
10.424 [1/27/38] Building CUDA object CMakeFiles/FortranRuntime.dir/array-constructor.cpp.o
10.548 [1/26/39] Building CUDA object CMakeFiles/FortranRuntime.dir/descriptor-io.cpp.o
10.563 [1/25/40] Building CUDA object CMakeFiles/FortranRuntime.dir/type-info.cpp.o
10.569 [1/24/41] Building CUDA object CMakeFiles/FortranRuntime.dir/inquiry.cpp.o
10.646 [1/23/42] Building CUDA object CMakeFiles/FortranRuntime.dir/pointer.cpp.o
10.973 [1/22/43] Building CUDA object CMakeFiles/FortranRuntime.dir/tools.cpp.o
10.996 [1/21/44] Building CUDA object CMakeFiles/FortranRuntime.dir/derived.cpp.o
11.238 [1/20/45] Building CUDA object CMakeFiles/FortranRuntime.dir/external-unit.cpp.o
11.455 [1/19/46] Building CUDA object CMakeFiles/FortranRuntime.dir/transformational.cpp.o
11.657 [1/18/47] Building CUDA object CMakeFiles/FortranRuntime.dir/product.cpp.o
FAILED: CMakeFiles/FortranRuntime.dir/product.cpp.o 
ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -DRT_USE_LIBCUDACXX=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/../include -I/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build -I/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include -G -g -O3 -DNDEBUG --generate-code=arch=compute_80,code=[compute_80,sm_80]   -U_GLIBCXX_ASSERTIONS -U_LIBCPP_ENABLE_ASSERTIONS -std=c++17  -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti --expt-relaxed-constexpr -Xcudafe --diag_suppress=20208 -Xcudafe --display_error_number -MD -MT CMakeFiles/FortranRuntime.dir/product.cpp.o -MF CMakeFiles/FortranRuntime.dir/product.cpp.o.d -x cu -dc /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp -o CMakeFiles/FortranRuntime.dir/product.cpp.o
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/limits: In instantiation of ‘class cuda::std::__4::numeric_limits<long double>’:
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:524:60:   required from ‘constexpr cuda::std::__4::complex<_Tp> cuda::std::__4::operator*(const cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Tp>&) [with _Tp = long double]’
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:433:16:   required from ‘constexpr cuda::std::__4::complex<_Tp>& cuda::std::__4::operator*=(cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Up>&) [with _Tp = long double; _Up = long double]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:52:1:   required from ‘bool Fortran::runtime::ComplexProductAccumulator<PART>::AccumulateAt(const SubscriptValue*) [with A = cuda::std::__4::complex<long double>; PART = long double; Fortran::runtime::SubscriptValue = long int]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:62:47:   required from ‘void Fortran::runtime::DoTotalReduction(const Fortran::runtime::Descriptor&, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&, const char*, Fortran::runtime::Terminator&) [with TYPE = cuda::std::__4::complex<long double>; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:88:44:   required from ‘Fortran::runtime::CppTypeFor<CAT, KIND> Fortran::runtime::GetTotalReduction(const Fortran::runtime::Descriptor&, const char*, int, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&&, const char*) [with Fortran::common::TypeCategory CAT = Fortran::common::TypeCategory::Complex; int KIND = 10; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>; Fortran::runtime::CppTypeFor<CAT, KIND> = cuda::std::__4::complex<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:147:64:   required from here
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/limits:477:77: error: no type named ‘type’ in ‘class cuda::std::__4::__libcpp_numeric_limits<long double, true>’
  477 |     typedef typename __base::type type;
      |                                                                             ^   
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex: In instantiation of ‘constexpr cuda::std::__4::complex<_Tp> cuda::std::__4::operator*(const cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Tp>&) [with _Tp = long double]’:
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:433:16:   required from ‘constexpr cuda::std::__4::complex<_Tp>& cuda::std::__4::operator*=(cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Up>&) [with _Tp = long double; _Up = long double]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:52:1:   required from ‘bool Fortran::runtime::ComplexProductAccumulator<PART>::AccumulateAt(const SubscriptValue*) [with A = cuda::std::__4::complex<long double>; PART = long double; Fortran::runtime::SubscriptValue = long int]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:62:47:   required from ‘void Fortran::runtime::DoTotalReduction(const Fortran::runtime::Descriptor&, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&, const char*, Fortran::runtime::Terminator&) [with TYPE = cuda::std::__4::complex<long double>; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:88:44:   required from ‘Fortran::runtime::CppTypeFor<CAT, KIND> Fortran::runtime::GetTotalReduction(const Fortran::runtime::Descriptor&, const char*, int, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&&, const char*) [with Fortran::common::TypeCategory CAT = Fortran::common::TypeCategory::Complex; int KIND = 10; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<long double>; Fortran::runtime::CppTypeFor<CAT, KIND> = cuda::std::__4::complex<long double>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:147:64:   required from here
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:524:60: error: ‘quiet_NaN’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  524 |       return complex<_Tp>(_Tp(numeric_limits<_Tp>::quiet_NaN()), _Tp(0));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:530:60: error: ‘quiet_NaN’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  530 |         return complex<_Tp>(_Tp(numeric_limits<_Tp>::quiet_NaN()), _Tp(0));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:532:59: error: ‘infinity’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  532 |       return complex<_Tp>(_Tp(numeric_limits<_Tp>::infinity()), _Tp(numeric_limits<_Tp>::infinity()));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:540:60: error: ‘quiet_NaN’ is not a member of ‘cuda::std::__4::numeric_limits<long double>’
  540 |       return complex<_Tp>(_Tp(numeric_limits<_Tp>::quiet_NaN()), _Tp(0));
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/cmath: In instantiation of ‘constexpr cuda::std::__4::__enable_if_t<(! cuda::std::__4::is_floating_point<_Tp>::value), bool> cuda::std::__4::__constexpr_isinf(_A1) [with _A1 = __float128; cuda::std::__4::__enable_if_t<(! cuda::std::__4::is_floating_point<_Tp>::value), bool> = bool]’:
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:512:38:   required from ‘constexpr cuda::std::__4::complex<_Tp> cuda::std::__4::operator*(const cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Tp>&) [with _Tp = __float128]’
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/complex:433:16:   required from ‘constexpr cuda::std::__4::complex<_Tp>& cuda::std::__4::operator*=(cuda::std::__4::complex<_Tp>&, const cuda::std::__4::complex<_Up>&) [with _Tp = __float128; _Up = __float128]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:52:1:   required from ‘bool Fortran::runtime::ComplexProductAccumulator<PART>::AccumulateAt(const SubscriptValue*) [with A = cuda::std::__4::complex<__float128>; PART = __float128; Fortran::runtime::SubscriptValue = long int]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:62:47:   required from ‘void Fortran::runtime::DoTotalReduction(const Fortran::runtime::Descriptor&, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&, const char*, Fortran::runtime::Terminator&) [with TYPE = cuda::std::__4::complex<__float128>; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<__float128>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/reduction-templates.h:88:44:   required from ‘Fortran::runtime::CppTypeFor<CAT, KIND> Fortran::runtime::GetTotalReduction(const Fortran::runtime::Descriptor&, const char*, int, int, const Fortran::runtime::Descriptor*, ACCUMULATOR&&, const char*) [with Fortran::common::TypeCategory CAT = Fortran::common::TypeCategory::Complex; int KIND = 16; ACCUMULATOR = Fortran::runtime::ComplexProductAccumulator<__float128>; Fortran::runtime::CppTypeFor<CAT, KIND> = cuda::std::__4::complex<__float128>]’
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/runtime/product.cpp:156:64:   required from here
/home/buildbot/worker/third-party/nv/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/cmath:666:15: error: call of overloaded ‘isinf(__float128&)’ is ambiguous
  666 |     return ::isinf(__lcpp_x);

vzakhari added a commit to vzakhari/llvm-project that referenced this pull request Sep 18, 2024
…uild. (llvm#109078)"

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.

Additional change on top of llvm#109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).
vzakhari added a commit that referenced this pull request Sep 19, 2024
…uild. (#109078)" (#109207)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.

Additional change on top of #109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
…lvm#109078)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
…uild. (llvm#109078)" (llvm#109207)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.

Additional change on top of llvm#109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants