Skip to content

[libc] Change ctype to be encoding independent #110574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

michaelrj-google
Copy link
Contributor

The previous implementation of the ctype functions assumed ASCII.
This patch changes to a switch/case implementation that looks odd, but
actually is easier for the compiler to understand and optimize.

The previous implementation of the ctype functions assumed ASCII.
This patch changes to a switch/case implementation that looks odd, but
actually is easier for the compiler to understand and optimize.
@michaelrj-google michaelrj-google marked this pull request as ready for review November 14, 2024 19:31
@llvmbot llvmbot added libc bazel "Peripheral" support tier build system: utils/bazel labels Nov 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/pr-subscribers-libc

Author: Michael Jones (michaelrj-google)

Changes

The previous implementation of the ctype functions assumed ASCII.
This patch changes to a switch/case implementation that looks odd, but
actually is easier for the compiler to understand and optimize.


Patch is 30.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110574.diff

14 Files Affected:

  • (modified) libc/src/__support/ctype_utils.h (+535-23)
  • (modified) libc/src/__support/integer_literals.h (+8-20)
  • (modified) libc/src/ctype/isxdigit.cpp (+2-1)
  • (modified) libc/src/ctype/isxdigit_l.cpp (+2-1)
  • (modified) libc/src/stdio/printf_core/fixed_converter.h (+6-8)
  • (modified) libc/src/stdio/printf_core/float_dec_converter.h (+3-3)
  • (modified) libc/src/stdio/printf_core/float_hex_converter.h (+10-7)
  • (modified) libc/src/stdio/printf_core/float_inf_nan_converter.h (+5-4)
  • (modified) libc/src/stdio/printf_core/int_converter.h (+8-13)
  • (modified) libc/src/stdio/scanf_core/converter_utils.h (-10)
  • (modified) libc/src/stdio/scanf_core/float_converter.cpp (+10-8)
  • (modified) libc/src/stdio/scanf_core/int_converter.cpp (+7-5)
  • (modified) libc/src/stdio/scanf_core/ptr_converter.cpp (+3-1)
  • (modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+2)
diff --git a/libc/src/__support/ctype_utils.h b/libc/src/__support/ctype_utils.h
index 91f6ce8cabd8d0..8521857ce765de 100644
--- a/libc/src/__support/ctype_utils.h
+++ b/libc/src/__support/ctype_utils.h
@@ -15,44 +15,556 @@
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
-// ------------------------------------------------------
-// Rationale: Since these classification functions are
-// called in other functions, we will avoid the overhead
-// of a function call by inlining them.
-// ------------------------------------------------------
+// -----------------------------------------------------------------------------
+// ******************                 WARNING                 ******************
+// ****************** DO NOT TRY TO OPTIMIZE THESE FUNCTIONS! ******************
+// -----------------------------------------------------------------------------
+// This switch/case form is easier for the compiler to understand, and is
+// optimized into a form that is almost always the same as or better than
+// versions written by hand (see https://godbolt.org/z/qvrebqvvr). Also this
+// form makes these functions encoding independent. If you want to rewrite these
+// functions, make sure you have benchmarks to show your new solution is faster,
+// as well as a way to support non-ASCII character encodings.
 
-LIBC_INLINE static constexpr bool isalpha(unsigned ch) {
-  return (ch | 32) - 'a' < 26;
+LIBC_INLINE static constexpr bool islower(int ch) {
+  switch (ch) {
+  case 'a':
+  case 'b':
+  case 'c':
+  case 'd':
+  case 'e':
+  case 'f':
+  case 'g':
+  case 'h':
+  case 'i':
+  case 'j':
+  case 'k':
+  case 'l':
+  case 'm':
+  case 'n':
+  case 'o':
+  case 'p':
+  case 'q':
+  case 'r':
+  case 's':
+  case 't':
+  case 'u':
+  case 'v':
+  case 'w':
+  case 'x':
+  case 'y':
+  case 'z':
+    return true;
+  default:
+    return false;
+  }
 }
 
-LIBC_INLINE static constexpr bool isdigit(unsigned ch) {
-  return (ch - '0') < 10;
+LIBC_INLINE static constexpr bool isupper(int ch) {
+  switch (ch) {
+  case 'A':
+  case 'B':
+  case 'C':
+  case 'D':
+  case 'E':
+  case 'F':
+  case 'G':
+  case 'H':
+  case 'I':
+  case 'J':
+  case 'K':
+  case 'L':
+  case 'M':
+  case 'N':
+  case 'O':
+  case 'P':
+  case 'Q':
+  case 'R':
+  case 'S':
+  case 'T':
+  case 'U':
+  case 'V':
+  case 'W':
+  case 'X':
+  case 'Y':
+  case 'Z':
+    return true;
+  default:
+    return false;
+  }
 }
 
-LIBC_INLINE static constexpr bool isalnum(unsigned ch) {
-  return isalpha(ch) || isdigit(ch);
+LIBC_INLINE static constexpr bool isdigit(int ch) {
+  switch (ch) {
+  case '0':
+  case '1':
+  case '2':
+  case '3':
+  case '4':
+  case '5':
+  case '6':
+  case '7':
+  case '8':
+  case '9':
+    return true;
+  default:
+    return false;
+  }
 }
 
-LIBC_INLINE static constexpr bool isgraph(unsigned ch) {
-  return 0x20 < ch && ch < 0x7f;
+LIBC_INLINE static constexpr int tolower(int ch) {
+  switch (ch) {
+  case 'A':
+    return 'a';
+  case 'B':
+    return 'b';
+  case 'C':
+    return 'c';
+  case 'D':
+    return 'd';
+  case 'E':
+    return 'e';
+  case 'F':
+    return 'f';
+  case 'G':
+    return 'g';
+  case 'H':
+    return 'h';
+  case 'I':
+    return 'i';
+  case 'J':
+    return 'j';
+  case 'K':
+    return 'k';
+  case 'L':
+    return 'l';
+  case 'M':
+    return 'm';
+  case 'N':
+    return 'n';
+  case 'O':
+    return 'o';
+  case 'P':
+    return 'p';
+  case 'Q':
+    return 'q';
+  case 'R':
+    return 'r';
+  case 'S':
+    return 's';
+  case 'T':
+    return 't';
+  case 'U':
+    return 'u';
+  case 'V':
+    return 'v';
+  case 'W':
+    return 'w';
+  case 'X':
+    return 'x';
+  case 'Y':
+    return 'y';
+  case 'Z':
+    return 'z';
+  default:
+    return ch;
+  }
 }
 
-LIBC_INLINE static constexpr bool islower(unsigned ch) {
-  return (ch - 'a') < 26;
+LIBC_INLINE static constexpr int toupper(int ch) {
+  switch (ch) {
+  case 'a':
+    return 'A';
+  case 'b':
+    return 'B';
+  case 'c':
+    return 'C';
+  case 'd':
+    return 'D';
+  case 'e':
+    return 'E';
+  case 'f':
+    return 'F';
+  case 'g':
+    return 'G';
+  case 'h':
+    return 'H';
+  case 'i':
+    return 'I';
+  case 'j':
+    return 'J';
+  case 'k':
+    return 'K';
+  case 'l':
+    return 'L';
+  case 'm':
+    return 'M';
+  case 'n':
+    return 'N';
+  case 'o':
+    return 'O';
+  case 'p':
+    return 'P';
+  case 'q':
+    return 'Q';
+  case 'r':
+    return 'R';
+  case 's':
+    return 'S';
+  case 't':
+    return 'T';
+  case 'u':
+    return 'U';
+  case 'v':
+    return 'V';
+  case 'w':
+    return 'W';
+  case 'x':
+    return 'X';
+  case 'y':
+    return 'Y';
+  case 'z':
+    return 'Z';
+  default:
+    return ch;
+  }
 }
 
-LIBC_INLINE static constexpr bool isupper(unsigned ch) {
-  return (ch - 'A') < 26;
+LIBC_INLINE static constexpr bool isalpha(int ch) {
+  switch (ch) {
+  case 'a':
+  case 'b':
+  case 'c':
+  case 'd':
+  case 'e':
+  case 'f':
+  case 'g':
+  case 'h':
+  case 'i':
+  case 'j':
+  case 'k':
+  case 'l':
+  case 'm':
+  case 'n':
+  case 'o':
+  case 'p':
+  case 'q':
+  case 'r':
+  case 's':
+  case 't':
+  case 'u':
+  case 'v':
+  case 'w':
+  case 'x':
+  case 'y':
+  case 'z':
+  case 'A':
+  case 'B':
+  case 'C':
+  case 'D':
+  case 'E':
+  case 'F':
+  case 'G':
+  case 'H':
+  case 'I':
+  case 'J':
+  case 'K':
+  case 'L':
+  case 'M':
+  case 'N':
+  case 'O':
+  case 'P':
+  case 'Q':
+  case 'R':
+  case 'S':
+  case 'T':
+  case 'U':
+  case 'V':
+  case 'W':
+  case 'X':
+  case 'Y':
+  case 'Z':
+    return true;
+  default:
+    return false;
+  }
 }
 
-LIBC_INLINE static constexpr bool isspace(unsigned ch) {
-  return ch == ' ' || (ch - '\t') < 5;
+LIBC_INLINE static constexpr bool isalnum(int ch) {
+  switch (ch) {
+  case 'a':
+  case 'b':
+  case 'c':
+  case 'd':
+  case 'e':
+  case 'f':
+  case 'g':
+  case 'h':
+  case 'i':
+  case 'j':
+  case 'k':
+  case 'l':
+  case 'm':
+  case 'n':
+  case 'o':
+  case 'p':
+  case 'q':
+  case 'r':
+  case 's':
+  case 't':
+  case 'u':
+  case 'v':
+  case 'w':
+  case 'x':
+  case 'y':
+  case 'z':
+  case 'A':
+  case 'B':
+  case 'C':
+  case 'D':
+  case 'E':
+  case 'F':
+  case 'G':
+  case 'H':
+  case 'I':
+  case 'J':
+  case 'K':
+  case 'L':
+  case 'M':
+  case 'N':
+  case 'O':
+  case 'P':
+  case 'Q':
+  case 'R':
+  case 'S':
+  case 'T':
+  case 'U':
+  case 'V':
+  case 'W':
+  case 'X':
+  case 'Y':
+  case 'Z':
+  case '0':
+  case '1':
+  case '2':
+  case '3':
+  case '4':
+  case '5':
+  case '6':
+  case '7':
+  case '8':
+  case '9':
+    return true;
+  default:
+    return false;
+  }
 }
 
-LIBC_INLINE static constexpr int tolower(int ch) {
-  if (isupper(ch))
-    return ch + ('a' - 'A');
-  return ch;
+LIBC_INLINE static constexpr int b36_char_to_int(int ch) {
+  switch (ch) {
+  case '0':
+    return 0;
+  case '1':
+    return 1;
+  case '2':
+    return 2;
+  case '3':
+    return 3;
+  case '4':
+    return 4;
+  case '5':
+    return 5;
+  case '6':
+    return 6;
+  case '7':
+    return 7;
+  case '8':
+    return 8;
+  case '9':
+    return 9;
+  case 'a':
+  case 'A':
+    return 10;
+  case 'b':
+  case 'B':
+    return 11;
+  case 'c':
+  case 'C':
+    return 12;
+  case 'd':
+  case 'D':
+    return 13;
+  case 'e':
+  case 'E':
+    return 14;
+  case 'f':
+  case 'F':
+    return 15;
+  case 'g':
+  case 'G':
+    return 16;
+  case 'h':
+  case 'H':
+    return 17;
+  case 'i':
+  case 'I':
+    return 18;
+  case 'j':
+  case 'J':
+    return 19;
+  case 'k':
+  case 'K':
+    return 20;
+  case 'l':
+  case 'L':
+    return 21;
+  case 'm':
+  case 'M':
+    return 22;
+  case 'n':
+  case 'N':
+    return 23;
+  case 'o':
+  case 'O':
+    return 24;
+  case 'p':
+  case 'P':
+    return 25;
+  case 'q':
+  case 'Q':
+    return 26;
+  case 'r':
+  case 'R':
+    return 27;
+  case 's':
+  case 'S':
+    return 28;
+  case 't':
+  case 'T':
+    return 29;
+  case 'u':
+  case 'U':
+    return 30;
+  case 'v':
+  case 'V':
+    return 31;
+  case 'w':
+  case 'W':
+    return 32;
+  case 'x':
+  case 'X':
+    return 33;
+  case 'y':
+  case 'Y':
+    return 34;
+  case 'z':
+  case 'Z':
+    return 35;
+  default:
+    return 0;
+  }
+}
+
+LIBC_INLINE static constexpr int int_to_b36_char(int num) {
+  // Can't actually use LIBC_ASSERT here because it depends on integer_to_string
+  // which depends on this.
+
+  // LIBC_ASSERT(num < 36);
+  switch (num) {
+  case 0:
+    return '0';
+  case 1:
+    return '1';
+  case 2:
+    return '2';
+  case 3:
+    return '3';
+  case 4:
+    return '4';
+  case 5:
+    return '5';
+  case 6:
+    return '6';
+  case 7:
+    return '7';
+  case 8:
+    return '8';
+  case 9:
+    return '9';
+  case 10:
+    return 'a';
+  case 11:
+    return 'b';
+  case 12:
+    return 'c';
+  case 13:
+    return 'd';
+  case 14:
+    return 'e';
+  case 15:
+    return 'f';
+  case 16:
+    return 'g';
+  case 17:
+    return 'h';
+  case 18:
+    return 'i';
+  case 19:
+    return 'j';
+  case 20:
+    return 'k';
+  case 21:
+    return 'l';
+  case 22:
+    return 'm';
+  case 23:
+    return 'n';
+  case 24:
+    return 'o';
+  case 25:
+    return 'p';
+  case 26:
+    return 'q';
+  case 27:
+    return 'r';
+  case 28:
+    return 's';
+  case 29:
+    return 't';
+  case 30:
+    return 'u';
+  case 31:
+    return 'v';
+  case 32:
+    return 'w';
+  case 33:
+    return 'x';
+  case 34:
+    return 'y';
+  case 35:
+    return 'z';
+  default:
+    return '!';
+  }
+}
+
+LIBC_INLINE static constexpr bool isspace(int ch) {
+  switch (ch) {
+  case ' ':
+  case '\t':
+  case '\n':
+  case '\v':
+  case '\f':
+  case '\r':
+    return true;
+  default:
+    return false;
+  }
+}
+
+// not yet encoding independent.
+LIBC_INLINE static constexpr bool isgraph(int ch) {
+  return 0x20 < ch && ch < 0x7f;
 }
 
 } // namespace internal
diff --git a/libc/src/__support/integer_literals.h b/libc/src/__support/integer_literals.h
index 4c5c4c41666811..0298ec7d088d69 100644
--- a/libc/src/__support/integer_literals.h
+++ b/libc/src/__support/integer_literals.h
@@ -13,12 +13,13 @@
 #ifndef LLVM_LIBC_SRC___SUPPORT_INTEGER_LITERALS_H
 #define LLVM_LIBC_SRC___SUPPORT_INTEGER_LITERALS_H
 
-#include "src/__support/CPP/limits.h"        // CHAR_BIT
+#include "src/__support/CPP/limits.h" // CHAR_BIT
+#include "src/__support/ctype_utils.h"
 #include "src/__support/macros/attributes.h" // LIBC_INLINE
 #include "src/__support/macros/config.h"
-#include "src/__support/uint128.h"           // UInt128
-#include <stddef.h>                          // size_t
-#include <stdint.h>                          // uintxx_t
+#include "src/__support/uint128.h" // UInt128
+#include <stddef.h>                // size_t
+#include <stdint.h>                // uintxx_t
 
 namespace LIBC_NAMESPACE_DECL {
 
@@ -75,26 +76,13 @@ template <typename T, int base> struct DigitBuffer {
       push(*str);
   }
 
-  // Returns the digit for a particular character.
-  // Returns INVALID_DIGIT if the character is invalid.
-  LIBC_INLINE static constexpr uint8_t get_digit_value(const char c) {
-    const auto to_lower = [](char c) { return c | 32; };
-    const auto is_digit = [](char c) { return c >= '0' && c <= '9'; };
-    const auto is_alpha = [](char c) {
-      return ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z');
-    };
-    if (is_digit(c))
-      return static_cast<uint8_t>(c - '0');
-    if (base > 10 && is_alpha(c))
-      return static_cast<uint8_t>(to_lower(c) - 'a' + 10);
-    return INVALID_DIGIT;
-  }
-
   // Adds a single character to this buffer.
   LIBC_INLINE constexpr void push(char c) {
     if (c == '\'')
       return; // ' is valid but not taken into account.
-    const uint8_t value = get_digit_value(c);
+    const int b36_val = internal::b36_char_to_int(c);
+    const uint8_t value = static_cast<uint8_t>(
+        b36_val < base && (b36_val != 0 || c == '0') ? b36_val : INVALID_DIGIT);
     if (value == INVALID_DIGIT || size >= MAX_DIGITS) {
       // During constant evaluation `__builtin_unreachable` will halt the
       // compiler as it is not executable. This is preferable over `assert` that
diff --git a/libc/src/ctype/isxdigit.cpp b/libc/src/ctype/isxdigit.cpp
index 6b730c354db083..81f645c6f49fc9 100644
--- a/libc/src/ctype/isxdigit.cpp
+++ b/libc/src/ctype/isxdigit.cpp
@@ -16,7 +16,8 @@ namespace LIBC_NAMESPACE_DECL {
 
 LLVM_LIBC_FUNCTION(int, isxdigit, (int c)) {
   const unsigned ch = static_cast<unsigned>(c);
-  return static_cast<int>(internal::isdigit(ch) || (ch | 32) - 'a' < 6);
+  return static_cast<int>(internal::isalnum(ch) &&
+                          internal::b36_char_to_int(ch) < 16);
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/ctype/isxdigit_l.cpp b/libc/src/ctype/isxdigit_l.cpp
index 8a5c7d4d28ab1c..eddfd20a2da3b4 100644
--- a/libc/src/ctype/isxdigit_l.cpp
+++ b/libc/src/ctype/isxdigit_l.cpp
@@ -16,7 +16,8 @@ namespace LIBC_NAMESPACE_DECL {
 
 LLVM_LIBC_FUNCTION(int, isxdigit_l, (int c, locale_t)) {
   const unsigned ch = static_cast<unsigned>(c);
-  return static_cast<int>(internal::isdigit(ch) || (ch | 32) - 'a' < 6);
+  return static_cast<int>(internal::isalnum(ch) &&
+                          internal::b36_char_to_int(ch) < 16);
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdio/printf_core/fixed_converter.h b/libc/src/stdio/printf_core/fixed_converter.h
index c8812d77b62e34..ba0a62d9fcb87f 100644
--- a/libc/src/stdio/printf_core/fixed_converter.h
+++ b/libc/src/stdio/printf_core/fixed_converter.h
@@ -11,6 +11,7 @@
 
 #include "include/llvm-libc-macros/stdfix-macros.h"
 #include "src/__support/CPP/string_view.h"
+#include "src/__support/ctype_utils.h"
 #include "src/__support/fixed_point/fx_bits.h"
 #include "src/__support/fixed_point/fx_rep.h"
 #include "src/__support/integer_to_string.h"
@@ -68,10 +69,6 @@ LIBC_INLINE int convert_fixed(Writer *writer, const FormatSection &to_conv) {
   using LARep = fixed_point::FXRep<unsigned long accum>;
   using StorageType = LARep::StorageType;
 
-  // All of the letters will be defined relative to variable a, which will be
-  // the appropriate case based on the name of the conversion. This converts any
-  // conversion name into the letter 'a' with the appropriate case.
-  const char a = (to_conv.conv_name & 32) | 'A';
   FormatFlags flags = to_conv.flags;
 
   bool is_negative;
@@ -179,9 +176,9 @@ LIBC_INLINE int convert_fixed(Writer *writer, const FormatSection &to_conv) {
     // unspecified.
     RoundDirection round;
     char first_digit_after = fraction_digits[precision];
-    if (first_digit_after > '5') {
+    if (internal::b36_char_to_int(first_digit_after) > 5) {
       round = RoundDirection::Up;
-    } else if (first_digit_after < '5') {
+    } else if (internal::b36_char_to_int(first_digit_after) < 5) {
       round = RoundDirection::Down;
     } else {
       // first_digit_after == '5'
@@ -204,7 +201,8 @@ LIBC_INLINE int convert_fixed(Writer *writer, const FormatSection &to_conv) {
         keep_rounding = false;
         char cur_digit = fraction_digits[digit_to_round];
         // if the digit should not be rounded up
-        if (round == RoundDirection::Even && ((cur_digit - '0') % 2) == 0) {
+        if (round == RoundDirection::Even &&
+            (internal::b36_char_to_int(cur_digit) % 2) == 0) {
           // break out of the loop
           break;
         }
@@ -246,7 +244,7 @@ LIBC_INLINE int convert_fixed(Writer *writer, const FormatSection &to_conv) {
   char sign_char = 0;
 
   // Check if the conv name is uppercase
-  if (a == 'A') {
+  if (internal::isupper(to_conv.conv_name)) {
     // These flags are only for signed conversions, so this removes them if the
     // conversion is unsigned.
     flags = FormatFlags(flags &
diff --git a/libc/src/stdio/printf_core/float_dec_converter.h b/libc/src/stdio/printf_core/float_dec_converter.h
index e39ba6ecea8d48..d93457fcafd7f9 100644
--- a/libc/src/stdio/printf_core/float_dec_converter.h
+++ b/libc/src/stdio/printf_core/float_dec_converter.h
@@ -13,6 +13,7 @@
 #include "src/__support/FPUtil/FPBits.h"
 #include "src/__support/FPUtil/rounding_mode.h"
 #include "src/__support/big_int.h" // is_big_int_v
+#include "src/__support/ctype_utils.h"
 #include "src/__support/float_to_string.h"
 #include "src/__support/integer_to_string.h"
 #include "src/__support/libc_assert.h"
@@ -587,8 +588,6 @@ LIBC_INLINE int convert_float_dec_exp_typed(Writer *writer,
   int exponent = float_bits.get_explicit_exponent();
   StorageType mantissa = float_bits.get_explicit_mantissa();
 
-  const char a = (to_conv.conv_name & 32) | 'A';
-
   char sign_char = 0;
 
   if (float_bits.is_neg())
@@ -734,7 +733,8 @@ LIBC_INLINE int convert_float_dec_exp_typed(Writer *writer,
   round = get_round_direction(last_digit, truncated, float_bits.sign());
 
   RET_IF_RESULT_NEGATIVE(float_writer.write_last_block(
-      digits, maximum, round, final_exponent, a + 'E' - 'A'));
+      digits, maximum, round, final_exponent,
+      internal::islower(to_conv.conv_name) ? 'e' : 'E'));
 
   RET_IF_RESULT_NEGATIVE(float_writer.right_pad());
   return WRITE_OK;
diff --git a/libc/src/stdio/printf_core/float_hex_converter.h b/libc/src/stdio/printf_core/float_hex_converter.h
index 0b3ff3dd1cbfdc..5d9c42882a589b 100644
--- a/libc/src/stdio/printf_core/float_hex_converter.h
+++ b/libc/src/stdio/printf_core/float_hex_converter.h
@@ -12,6 +12,7 @@
 #include "src/__support/CPP/string_view.h"
 #include "src/__support/FPUtil/FPBits.h"
 #include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/ctype_utils.h"
 #include "src/__support/macros/config.h"
 #include "src/stdio/printf_core/converter_utils.h"
 #include "src/stdio/printf_core/core_structs.h"
@@ -31,7 +32,6 @@ LIBC_INLINE int convert_float_hex_exp(Writer *writer,
   // All of the letters will be defined relative to variable a, which will be
   // the appropriate case based on the name of the conversion. This converts any
   // conversion name into the letter 'a' with the appropriate case.
-  const char a = (to_conv.conv_name & 32) | 'A';
 
   bool is_negative;
   int exponent;
@@ -138,9 +138,11 @@ LIBC_INLINE int convert_float_hex_exp(Writer *writer,
   size_t mant_cur = mant_len;
   size_t first_non_zero = 1;
   for (; mant_cur > 0; --mant_cur, mantissa >>= 4) {
-    char mant_mod_16 = static_cast<char>(mantissa) & 15;
-    char new_digit = static_cast<char>(
-        (mant_mod_16 > 9) ? (mant_mod_16 - 10 + a) : (mant_mod_16 + '0'));
+    char mant_mod_16 = static_cast<char>(mantissa % 16);
+    char new_digit = static_cast<char>(internal::int_to_b36_char(mant_mod_16));
+    if (internal::isupper(to_conv.conv_name)) {
+      new_digit = static_cast<char>(internal::toupper(new_digit));
+    }
     mant_buffer[mant_cur - 1] = new_digit;
     if (new_digit != '0' && first_non_zero < mant_cur)
       first_non_zero = mant_cur;
@@ -168,7 +170,8 @@ LIBC_INLINE int convert_float_hex_exp(Writer *writer,
 
   size_t exp_cur = EXP_LEN;
   for (; exponent > 0; --exp_cur, exponent /= 10) {
-    exp_buffer[exp_cur - 1] = static_cast<char>((exponent % 10) + '0');
+    exp_buffer[exp_cur - 1] =
+        static_cast<char>(internal::int_to_b36_char(exponent % 10));
   }
   if (exp_cur == EXP_LEN) { // if nothing else was written, write a 0.
     exp_buffer[EXP_LEN - 1] = '0';
@@ -187,7 +190,7 @@ LIBC_INLINE int convert_float_hex_exp(Writer *writer,
   constexpr size_t PREFIX_LEN = 2;
   char prefix[PREFIX_LEN];
   prefix[0] = '0';
-  prefix[1] = a + ('x' - 'a');
+  prefix[1] = internal::islower(to_conv.conv_name) ? 'x' : 'X';
   const cpp::string_view prefix_str(prefix, PREFIX_LEN);
 
   // If the precision is greater than the actual result, pad with 0s
@@ -200,7 +203,7 @@ LIBC_INLINE int convert_float_hex_exp(Writer *writer,
   constexpr cpp::string_view HEXADECIMAL_POINT(".");
 
   // This is for the letter 'p' before the exponent.
-  const char exp_separator = a + ('p' - 'a');
+  const char exp_separator = internal::islower(to_conv.conv_name) ? 'p' : 'P';
   constexpr int EXP_SEPARATOR_LEN = 1;
 
   padding = static_cast<int>(to_conv.min_width - (sign_char > 0 ? 1 : 0) -
diff --git a/libc/src/stdio/printf_core/float_inf_nan_converter.h b/libc/src/stdio/printf_core/float_inf_nan_converter.h
index a7da682b835bee..3e41612e21c9fc 100644
--- a/libc/src/stdio/printf_core/float_inf_nan_converter.h
+++ b/libc/src/stdio/printf_...
[truncated]

A bunch of string conversion/testing code needed to be updated. I'm
gonna need to clean it up more later but that's gonna be a followup.
Also I need to set up the ctype function in bazel. Blarg.
Copy link
Member

@muiez muiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The changes in this PR fixed a lot of failing tests on z/OS.

Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but LGTM

Comment on lines 218 to 219
const char result = static_cast<char>(internal::int_to_b36_char(digit));
return static_cast<char>(Fmt::IS_UPPERCASE ? internal::toupper(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's an extra cast in here, that might not be necessary?

Suggested change
const char result = static_cast<char>(internal::int_to_b36_char(digit));
return static_cast<char>(Fmt::IS_UPPERCASE ? internal::toupper(result)
const int result = internal::int_to_b36_char(digit);
return static_cast<char>(Fmt::IS_UPPERCASE ? internal::toupper(result)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int_to_b36_char returns an int to match with the other ctype functions. Without the cast you get a warning for loss of precision.

@michaelrj-google michaelrj-google merged commit a0c4f85 into llvm:main Dec 3, 2024
5 of 6 checks passed
@michaelrj-google michaelrj-google deleted the libcSimplifyCtype branch December 3, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bazel "Peripheral" support tier build system: utils/bazel libc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants