Skip to content

[libc][math][c23] Add exp10f16 C23 math function #101588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 7, 2024

Conversation

overmighty
Copy link
Member

Part of #95250.

@overmighty overmighty requested a review from lntue August 2, 2024 00:12
@llvmbot llvmbot added the libc label Aug 2, 2024
@overmighty
Copy link
Member Author

overmighty commented Aug 2, 2024

Based on #101217.

@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2024

@llvm/pr-subscribers-libc

Author: OverMighty (overmighty)

Changes

Part of #95250.


Patch is 32.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101588.diff

17 Files Affected:

  • (modified) libc/config/linux/x86_64/entrypoints.txt (+2)
  • (modified) libc/docs/math/index.rst (+2-2)
  • (modified) libc/spec/stdc.td (+2)
  • (modified) libc/src/math/CMakeLists.txt (+2)
  • (added) libc/src/math/exp10f16.h (+21)
  • (added) libc/src/math/exp2f16.h (+21)
  • (modified) libc/src/math/generic/CMakeLists.txt (+45)
  • (added) libc/src/math/generic/exp10f16.cpp (+177)
  • (added) libc/src/math/generic/exp2f16.cpp (+134)
  • (modified) libc/test/src/math/CMakeLists.txt (+42-20)
  • (added) libc/test/src/math/exp10f16_test.cpp (+40)
  • (added) libc/test/src/math/exp2f16_test.cpp (+40)
  • (modified) libc/test/src/math/performance_testing/CMakeLists.txt (+11)
  • (added) libc/test/src/math/performance_testing/exp2f16_perf.cpp (+22)
  • (modified) libc/test/src/math/smoke/CMakeLists.txt (+42-18)
  • (added) libc/test/src/math/smoke/exp10f16_test.cpp (+65)
  • (added) libc/test/src/math/smoke/exp2f16_test.cpp (+65)
diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 0eb1690172a8e..70284e0ef3c81 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -564,6 +564,8 @@ if(LIBC_TYPES_HAS_FLOAT16)
     libc.src.math.canonicalizef16
     libc.src.math.ceilf16
     libc.src.math.copysignf16
+    libc.src.math.exp10f16
+    libc.src.math.exp2f16
     libc.src.math.expf16
     libc.src.math.f16add
     libc.src.math.f16addf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index 0448f8f36d2c2..1b38627d73cec 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -286,11 +286,11 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | exp       | |check|          | |check|         |                        | |check|              |                        | 7.12.6.1               | F.10.3.1                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| exp10     | |check|          | |check|         |                        |                      |                        | 7.12.6.2               | F.10.3.2                   |
+| exp10     | |check|          | |check|         |                        | |check|              |                        | 7.12.6.2               | F.10.3.2                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | exp10m1   |                  |                 |                        |                      |                        | 7.12.6.3               | F.10.3.3                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| exp2      | |check|          | |check|         |                        |                      |                        | 7.12.6.4               | F.10.3.4                   |
+| exp2      | |check|          | |check|         |                        | |check|              |                        | 7.12.6.4               | F.10.3.4                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | exp2m1    | |check|          |                 |                        |                      |                        | 7.12.6.5               | F.10.3.5                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index 788b9581cb2ab..3b55d98f18330 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -577,6 +577,7 @@ def StdC : StandardSpec<"stdc"> {
 
           FunctionSpec<"exp2", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,
           FunctionSpec<"exp2f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
+          GuardedFunctionSpec<"exp2f16", RetValSpec<Float16Type>, [ArgSpec<Float16Type>], "LIBC_TYPES_HAS_FLOAT16">,
 
           FunctionSpec<"exp2m1f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
 
@@ -585,6 +586,7 @@ def StdC : StandardSpec<"stdc"> {
 
           FunctionSpec<"exp10", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,
           FunctionSpec<"exp10f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
+          GuardedFunctionSpec<"exp10f16", RetValSpec<Float16Type>, [ArgSpec<Float16Type>], "LIBC_TYPES_HAS_FLOAT16">,
 
           FunctionSpec<"remainder", RetValSpec<DoubleType>, [ArgSpec<DoubleType>, ArgSpec<DoubleType>]>,
           FunctionSpec<"remainderf", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>]>,
diff --git a/libc/src/math/CMakeLists.txt b/libc/src/math/CMakeLists.txt
index eb33532826ab8..8f7ecaeb25be6 100644
--- a/libc/src/math/CMakeLists.txt
+++ b/libc/src/math/CMakeLists.txt
@@ -101,11 +101,13 @@ add_math_entrypoint_object(expf16)
 
 add_math_entrypoint_object(exp2)
 add_math_entrypoint_object(exp2f)
+add_math_entrypoint_object(exp2f16)
 
 add_math_entrypoint_object(exp2m1f)
 
 add_math_entrypoint_object(exp10)
 add_math_entrypoint_object(exp10f)
+add_math_entrypoint_object(exp10f16)
 
 add_math_entrypoint_object(expm1)
 add_math_entrypoint_object(expm1f)
diff --git a/libc/src/math/exp10f16.h b/libc/src/math/exp10f16.h
new file mode 100644
index 0000000000000..62a62f7da59ed
--- /dev/null
+++ b/libc/src/math/exp10f16.h
@@ -0,0 +1,21 @@
+//===-- Implementation header for exp10f16 ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_EXP10F16_H
+#define LLVM_LIBC_SRC_MATH_EXP10F16_H
+
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/properties/types.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+float16 exp10f16(float16 x);
+
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_MATH_EXP10F16_H
diff --git a/libc/src/math/exp2f16.h b/libc/src/math/exp2f16.h
new file mode 100644
index 0000000000000..71361b997ae8e
--- /dev/null
+++ b/libc/src/math/exp2f16.h
@@ -0,0 +1,21 @@
+//===-- Implementation header for exp2f16 -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_EXP2F16_H
+#define LLVM_LIBC_SRC_MATH_EXP2F16_H
+
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/properties/types.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+float16 exp2f16(float16 x);
+
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_MATH_EXP2F16_H
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index f118742bddd68..9836c68be444a 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -1305,6 +1305,28 @@ add_entrypoint_object(
     -O3
 )
 
+add_entrypoint_object(
+  exp2f16
+  SRCS
+    exp2f16.cpp
+  HDRS
+    ../exp2f16.h
+  DEPENDS
+    libc.hdr.errno_macros
+    libc.hdr.fenv_macros
+    libc.src.__support.CPP.array
+    libc.src.__support.FPUtil.except_value_utils
+    libc.src.__support.FPUtil.fenv_impl
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.nearest_integer
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.optimization
+  COMPILE_OPTIONS
+    -O3
+)
+
 add_entrypoint_object(
   exp2m1f
   SRCS
@@ -1385,6 +1407,29 @@ add_entrypoint_object(
     -O3
 )
 
+add_entrypoint_object(
+  exp10f16
+  SRCS
+    exp10f16.cpp
+  HDRS
+    ../exp10f16.h
+  DEPENDS
+    libc.hdr.errno_macros
+    libc.hdr.fenv_macros
+    libc.src.__support.CPP.array
+    libc.src.__support.FPUtil.except_value_utils
+    libc.src.__support.FPUtil.fenv_impl
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.nearest_integer
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.optimization
+    libc.src.__support.macros.properties.cpu_features
+  COMPILE_OPTIONS
+    -O3
+)
+
 add_entrypoint_object(
   expm1
   SRCS
diff --git a/libc/src/math/generic/exp10f16.cpp b/libc/src/math/generic/exp10f16.cpp
new file mode 100644
index 0000000000000..62349ee14d9c2
--- /dev/null
+++ b/libc/src/math/generic/exp10f16.cpp
@@ -0,0 +1,177 @@
+//===-- Half-precision 10^x function --------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/exp10f16.h"
+#include "hdr/errno_macros.h"
+#include "hdr/fenv_macros.h"
+#include "src/__support/CPP/array.h"
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/except_value_utils.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/common.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h"
+#include "src/__support/macros/properties/cpu_features.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+static constexpr size_t N_EXP10F16_EXCEPTS = 5
+#ifndef LIBC_TARGET_CPU_HAS_FMA
+                                             + 3
+#endif
+    ;
+
+static constexpr fputil::ExceptValues<float16, N_EXP10F16_EXCEPTS>
+    EXP10F16_EXCEPTS = {{
+        // x = 0x1.8f4p-2, exp10f16(x) = 0x1.3ap+1 (RZ)
+        {0x363dU, 0x40e8U, 1U, 0U, 1U},
+        // x = 0x1.95cp-2, exp10f16(x) = 0x1.3ecp+1 (RZ)
+        {0x3657U, 0x40fbU, 1U, 0U, 0U},
+        // x = -0x1.018p-4, exp10f16(x) = 0x1.bbp-1 (RZ)
+        {0xac06U, 0x3aecU, 1U, 0U, 0U},
+        // x = -0x1.c28p+0, exp10f16(x) = 0x1.1ccp-6 (RZ)
+        {0xbf0aU, 0x2473U, 1U, 0U, 0U},
+        // x = -0x1.e1cp+1, exp10f16(x) = 0x1.694p-13 (RZ)
+        {0xc387U, 0x09a5U, 1U, 0U, 0U},
+#ifndef LIBC_TARGET_CPU_HAS_FMA
+        // x = 0x1.0cp+1, exp10f16(x) = 0x1.f04p+6 (RZ)
+        {0x4030U, 0x57c1U, 1U, 0U, 1U},
+        // x = 0x1.1b8p+1, exp10f16(x) = 0x1.47cp+7 (RZ)
+        {0x406eU, 0x591fU, 1U, 0U, 1U},
+        // x = 0x1.1b8p+2, exp10f16(x) = 0x1.a4p+14 (RZ)
+        {0x446eU, 0x7690U, 1U, 0U, 1U},
+#endif
+    }};
+
+// Generated by Sollya with the following commands:
+//   > display = hexadecimal;
+//   > round(log2(10), SG, RN);
+static constexpr float LOG2F_10 = 0x1.a934fp+1f;
+
+// Generated by Sollya with the following commands:
+//   > display = hexadecimal;
+//   > round(log10(2), SG, RN);
+static constexpr float LOG10F_2 = 0x1.344136p-2f;
+
+// Generated by Sollya with the following commands:
+//   > display = hexadecimal;
+//   > for i from 0 to 7 do printsingle(round(2^(i * 2^-3), SG, RN));
+static constexpr cpp::array<uint32_t, 8> EXP2_MID_BITS = {
+    0x3f80'0000U, 0x3f8b'95c2U, 0x3f98'37f0U, 0x3fa5'fed7U,
+    0x3fb5'04f3U, 0x3fc5'672aU, 0x3fd7'44fdU, 0x3fea'c0c7U,
+};
+
+LLVM_LIBC_FUNCTION(float16, exp10f16, (float16 x)) {
+  using FPBits = fputil::FPBits<float16>;
+  FPBits x_bits(x);
+
+  uint16_t x_u = x_bits.uintval();
+  uint16_t x_abs = x_u & 0x7fffU;
+
+  // When |x| >= 5, or x is NaN.
+  if (LIBC_UNLIKELY(x_abs >= 0x4500U)) {
+    // exp10(NaN) = NaN
+    if (x_bits.is_nan()) {
+      if (x_bits.is_signaling_nan()) {
+        fputil::raise_except_if_required(FE_INVALID);
+        return FPBits::quiet_nan().get_val();
+      }
+
+      return x;
+    }
+
+    // When x >= 5.
+    if (x_bits.is_pos()) {
+      // exp10(+inf) = +inf
+      if (x_bits.is_inf())
+        return FPBits::inf().get_val();
+
+      switch (fputil::quick_get_round()) {
+      case FE_TONEAREST:
+      case FE_UPWARD:
+        fputil::set_errno_if_required(ERANGE);
+        fputil::raise_except_if_required(FE_OVERFLOW);
+        return FPBits::inf().get_val();
+      default:
+        return FPBits::max_normal().get_val();
+      }
+    }
+
+    // When x <= -8.
+    if (x_u >= 0xc800U) {
+      // exp10(-inf) = +0
+      if (x_bits.is_inf())
+        return FPBits::zero().get_val();
+
+      fputil::set_errno_if_required(ERANGE);
+      fputil::raise_except_if_required(FE_UNDERFLOW | FE_INEXACT);
+
+      if (fputil::fenv_is_round_up())
+        return FPBits::min_subnormal().get_val();
+      return FPBits::zero().get_val();
+    }
+  }
+
+  // When x is 1, 2, 3, or 4. These are hard-to-round cases with exact results.
+  if (LIBC_UNLIKELY((x_u & ~(0x3c00U | 0x4000U | 0x4200U | 0x4400U)) == 0)) {
+    switch (x_u) {
+    case 0x3c00U: // x = 1.0f16
+      return static_cast<float16>(10.0);
+    case 0x4000U: // x = 2.0f16
+      return static_cast<float16>(100.0);
+    case 0x4200U: // x = 3.0f16
+      return static_cast<float16>(1'000.0);
+    case 0x4400U: // x = 4.0f16
+      return static_cast<float16>(10'000.0);
+    }
+  }
+
+  if (auto r = EXP10F16_EXCEPTS.lookup(x_u); LIBC_UNLIKELY(r.has_value()))
+    return r.value();
+
+  // For -8 < x < 5, to compute 10^x, we perform the following range reduction:
+  // find hi, mid, lo, such that:
+  //   x = (hi + mid) * log2(10) + lo, in which
+  //     hi is an integer,
+  //     mid * 2^3 is an integer,
+  //     -2^(-4) <= lo < 2^(-4).
+  // In particular,
+  //   hi + mid = round(x * 2^3) * 2^(-3).
+  // Then,
+  //   10^x = 10^(hi + mid + lo) = 2^((hi + mid) * log2(10)) + 10^lo
+  // We store 2^mid in the lookup table EXP2_MID_BITS, and compute 2^hi * 2^mid
+  // by adding hi to the exponent field of 2^mid.  10^lo is computed using a
+  // degree-4 minimax polynomial generated by Sollya.
+
+  float xf = x;
+  float kf = fputil::nearest_integer(xf * (LOG2F_10 * 0x1.0p+3f));
+  int x_hi_mid = static_cast<int>(kf);
+  int x_hi = x_hi_mid >> 3;
+  int x_mid = x_hi_mid & 0x7;
+  // lo = x - (hi + mid) = round(x * 2^3 * log2(10)) * log10(2) * (-2^(-3)) + x
+  float lo = fputil::multiply_add(kf, LOG10F_2 * -0x1.0p-3f, xf);
+
+  uint32_t exp2_hi_mid_bits =
+      EXP2_MID_BITS[x_mid] +
+      static_cast<uint32_t>(x_hi << fputil::FPBits<float>::FRACTION_LEN);
+  float exp2_hi_mid = fputil::FPBits<float>(exp2_hi_mid_bits).get_val();
+  // Degree-4 minimax polynomial generated by Sollya with the following
+  // commands:
+  //   > display = hexadecimal;
+  //   > P = fpminimax((10^x - 1)/x, 3, [|SG...|], [-2^-4, 2^-4]);
+  //   > 1 + x * P;
+  float exp10_lo = fputil::polyeval(lo, 0x1p+0f, 0x1.26bb14p+1f, 0x1.53526p+1f,
+                                    0x1.04b434p+1f, 0x1.2bcf9ep+0f);
+  return static_cast<float16>(exp2_hi_mid * exp10_lo);
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/math/generic/exp2f16.cpp b/libc/src/math/generic/exp2f16.cpp
new file mode 100644
index 0000000000000..3f18ab9166a85
--- /dev/null
+++ b/libc/src/math/generic/exp2f16.cpp
@@ -0,0 +1,134 @@
+//===-- Half-precision 2^x function ---------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/exp2f16.h"
+#include "hdr/errno_macros.h"
+#include "hdr/fenv_macros.h"
+#include "src/__support/CPP/array.h"
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/except_value_utils.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/common.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+static constexpr fputil::ExceptValues<float16, 3> EXP2F16_EXCEPTS = {{
+    // (input, RZ output, RU offset, RD offset, RN offset)
+    // x = 0x1.714p-11, exp2f16(x) = 0x1p+0 (RZ)
+    {0x11c5U, 0x3c00U, 1U, 0U, 1U},
+    // x = -0x1.558p-4, exp2f16(x) = 0x1.e34p-1 (RZ)
+    {0xad56U, 0x3b8dU, 1U, 0U, 0U},
+    // x = -0x1.d5cp-4, exp2f16(x) = 0x1.d8cp-1 (RZ)
+    {0xaf57U, 0x3b63U, 1U, 0U, 0U},
+}};
+
+// Generated by Sollya with the following commands:
+//   > display = hexadecimal;
+//   > for i from 0 to 7 do printsingle(round(2^(i * 2^-3), SG, RN));
+static constexpr cpp::array<uint32_t, 8> EXP2_MID_BITS = {
+    0x3f80'0000U, 0x3f8b'95c2U, 0x3f98'37f0U, 0x3fa5'fed7U,
+    0x3fb5'04f3U, 0x3fc5'672aU, 0x3fd7'44fdU, 0x3fea'c0c7U,
+};
+
+LLVM_LIBC_FUNCTION(float16, exp2f16, (float16 x)) {
+  using FPBits = fputil::FPBits<float16>;
+  FPBits x_bits(x);
+
+  uint16_t x_u = x_bits.uintval();
+  uint16_t x_abs = x_u & 0x7fffU;
+
+  // When |x| >= 16, or x is NaN.
+  if (LIBC_UNLIKELY(x_abs >= 0x4c00U)) {
+    // exp2(NaN) = NaN
+    if (x_bits.is_nan()) {
+      if (x_bits.is_signaling_nan()) {
+        fputil::raise_except_if_required(FE_INVALID);
+        return FPBits::quiet_nan().get_val();
+      }
+
+      return x;
+    }
+
+    // When x >= 16.
+    if (x_bits.is_pos()) {
+      // exp2(+inf) = +inf
+      if (x_bits.is_inf())
+        return FPBits::inf().get_val();
+
+      switch (fputil::quick_get_round()) {
+      case FE_TONEAREST:
+      case FE_UPWARD:
+        fputil::set_errno_if_required(ERANGE);
+        fputil::raise_except_if_required(FE_OVERFLOW);
+        return FPBits::inf().get_val();
+      default:
+        return FPBits::max_normal().get_val();
+      }
+    }
+
+    // When x <= -25.
+    if (x_u >= 0xce40U) {
+      // exp2(-inf) = +0
+      if (x_bits.is_inf())
+        return FPBits::zero().get_val();
+
+      fputil::set_errno_if_required(ERANGE);
+      fputil::raise_except_if_required(FE_UNDERFLOW | FE_INEXACT);
+
+      if (fputil::fenv_is_round_up())
+        return FPBits::min_subnormal().get_val();
+      return FPBits::zero().get_val();
+    }
+  }
+
+  if (auto r = EXP2F16_EXCEPTS.lookup(x_u); LIBC_UNLIKELY(r.has_value()))
+    return r.value();
+
+  // For -25 < x < 16, to compute 2^x, we perform the following range reduction:
+  // find hi, mid, lo, such that:
+  //   x = hi + mid + lo, in which
+  //     hi is an integer,
+  //     mid * 2^3 is an integer,
+  //     -2^(-4) <= lo < 2^(-4).
+  // In particular,
+  //   hi + mid = round(x * 2^3) * 2^(-3).
+  // Then,
+  //   2^x = 2^(hi + mid + lo) = 2^hi * 2^mid * 2^lo.
+  // We store 2^mid in the lookup table EXP2_MID_BITS, and compute 2^hi * 2^mid
+  // by adding hi to the exponent field of 2^mid.  2^lo is computed using a
+  // degree-3 minimax polynomial generated by Sollya.
+
+  float xf = x;
+  float kf = fputil::nearest_integer(xf * 0x1.0p+3f);
+  int x_hi_mid = static_cast<int>(kf);
+  int x_hi = x_hi_mid >> 3;
+  int x_mid = x_hi_mid & 0x7;
+  // lo = x - (hi + mid) = round(x * 2^3) * (-2^(-3)) + x
+  float lo = fputil::multiply_add(kf, -0x1.0p-3f, xf);
+
+  uint32_t exp2_hi_mid_bits =
+      EXP2_MID_BITS[x_mid] +
+      static_cast<uint32_t>(x_hi << fputil::FPBits<float>::FRACTION_LEN);
+  float exp2_hi_mid = fputil::FPBits<float>(exp2_hi_mid_bits).get_val();
+  // Degree-3 minimax polynomial generated by Sollya with the following
+  // commands:
+  //   > display = hexadecimal;
+  //   > P = fpminimax((2^x - 1)/x, 2, [|SG...|], [-2^-4, 2^-4]);
+  //   > 1 + x * P;
+  float exp2_lo = fputil::polyeval(lo, 0x1p+0f, 0x1.62e43p-1f, 0x1.ec0aa6p-3f,
+                                   0x1.c6b4a6p-5f);
+  return static_cast<float16>(exp2_hi_mid * exp2_lo);
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/math/CMakeLists.txt b/libc/test/src/math/CMakeLists.txt
index 56b27b0952b58..b787d656e456a 100644
--- a/libc/test/src/math/CMakeLists.txt
+++ b/libc/test/src/math/CMakeLists.txt
@@ -925,6 +925,19 @@ add_fp_unittest(
     libc.src.math.expf16
 )
 
+add_fp_unittest(
+  exp2_test
+  NEED_MPFR
+  SUITE
+    libc-math-unittests
+  SRCS
+    exp2_test.cpp
+  DEPENDS
+    libc.src.errno.errno
+    libc.src.math.exp2
+    libc.src.__support.FPUtil.fp_bits
+)
+
 add_fp_unittest(
   exp2f_test
   NEED_MPFR
@@ -939,16 +952,14 @@ add_fp_unittest(
 )
 
 add_fp_unittest(
- exp2_test
- NEED_MPFR
- SUITE
-   libc-math-unittests
- SRCS
-   exp2_test.cpp
- DEPENDS
-   libc.src.errno.errno
-   libc.src.math.exp2
-   libc.src.__supp...
[truncated]

Comment on lines 65 to 71
// Generated by Sollya with the following commands:
// > display = hexadecimal;
// > for i from 0 to 7 do printsingle(round(2^(i * 2^-3), SG, RN));
static constexpr cpp::array<uint32_t, 8> EXP2_MID_BITS = {
0x3f80'0000U, 0x3f8b'95c2U, 0x3f98'37f0U, 0x3fa5'fed7U,
0x3fb5'04f3U, 0x3fc5'672aU, 0x3fd7'44fdU, 0x3fea'c0c7U,
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasted from exp2f16.cpp. I guess I should create a shared expxf16.h header instead.

@overmighty
Copy link
Member Author

On i7-13700H:

  • With Clang 18, without -march=native:

     Performance tests with inputs in denormal range:
    -- My function --
         Total time      : 1627880300 ns 
         Average runtime : 79.564 ns/op 
         Ops per second  : 12568491 op/s 
    -- Other function --
         Total time      : 21792295 ns 
         Average runtime : 1.06512 ns/op 
         Ops per second  : 938863942 op/s 
    -- Average runtime ratio --
         Mine / Other's  : 74.6998 
    
     Performance tests with inputs in normal range:
    -- My function --
         Total time      : 25289960974 ns 
         Average runtime : 41.1634 ns/op 
         Ops per second  : 24293434 op/s 
    -- Other function --
         Total time      : 648779538 ns 
         Average runtime : 1.05599 ns/op 
         Ops per second  : 946978078 op/s 
    -- Average runtime ratio --
         Mine / Other's  : 38.9808 
    
  • With Clang 18, with -march=native:

     Performance tests with inputs in denormal range:
    -- My function --
         Total time      : 348120098 ns 
         Average runtime : 17.0147 ns/op 
         Ops per second  : 58772820 op/s 
    -- Other function --
         Total time      : 21815025 ns 
         Average runtime : 1.06623 ns/op 
         Ops per second  : 937885700 op/s 
    -- Average runtime ratio --
         Mine / Other's  : 15.9578 
    
     Performance tests with inputs in normal range:
    -- My function --
         Total time      : 24294542949 ns 
         Average runtime : 39.5432 ns/op 
         Ops per second  : 25288806 op/s 
    -- Other function --
         Total time      : 648768173 ns 
         Average runtime : 1.05597 ns/op 
         Ops per second  : 946994667 op/s 
    -- Average runtime ratio --
         Mine / Other's  : 37.4472 
    

Copy link
Contributor

@lntue lntue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit.

@overmighty overmighty merged commit 59338ad into llvm:main Aug 7, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants