gh-108767: Convert Py_ISLOWER() macro to static inline function #108770

vstinner · 2023-09-01T15:08:26Z

Convert the following macros to static inline functions:

Py_ISLOWER()
Py_ISUPPER()
Py_ISALPHA()
Py_ISDIGIT()
Py_ISXDIGIT()
Py_ISALNUM()
Py_ISSPACE()
Py_TOLOWER()
Py_TOUPPER()
Py_CHARMASK()

Issue: Avoid locale dependent <ctype.h> functions like isascii(), isdigit(), tolower() #108767

vstinner · 2023-09-01T15:08:51Z

@erlend-aasland: A bunch of PEP 670 candies :-)

erlend-aasland · 2023-09-01T16:14:49Z

This generates -Wsign-compare and downcast warnings; it is not as easy as it looks like ;)

vstinner · 2023-09-01T16:20:44Z

This generates -Wsign-compare and downcast warnings; it is not as easy as it looks like ;)

It reminds me PyUnicode_Kind(obj) which gave me headaches (and then nightmakes!). It ended up by ... adding a comment to explain to me why I didn't convert it :-)

// PyUnicode_KIND(): Return one of the PyUnicode_*_KIND values defined above.
//
// gh-89653: Converting this macro to a static inline function would introduce
// new compiler warnings on "kind < PyUnicode_KIND(str)" (compare signed and
// unsigned numbers) where kind type is an int or on
// "unsigned int kind = PyUnicode_KIND(str)" (cast signed to unsigned).
#define PyUnicode_KIND(op) _Py_RVALUE(_PyASCIIObject_CAST(op)->state.kind)

vstinner · 2023-09-01T16:22:06Z

I confirm that building Python with gcc -Wsign-compare emits a new compiler warning:

./Modules/_sre/sre.c: In function 'sre_lower_ascii':
./Modules/_sre/sre.c:111:26: warning: operand of '?:' changes signedness from 'int' to 'unsigned int' due to unsignedness of other operand [-Wsign-compare]
  111 |     return ((ch) < 128 ? Py_TOLOWER(ch) : ch);
      |                          ^~~~~~~~~~~~~~

Tested: GCC 13.2.1 on Fedora 38.

vstinner · 2023-09-01T16:27:49Z

Modules/_sre/sre.c pass an unigned int, whereas the API doesn't seem to be designed for unsigned int:

/* Argument must be a char or an int in [-128, 127] or [0, 255]. */
#define Py_CHARMASK(c) ((unsigned char)((c) & 0xff))

The must word is scary!

serhiy-storchaka · 2023-09-01T18:33:22Z

I do not think this should be done. The new code is larger and much less readable (to me). And it creates new problems to fight!

erlend-aasland · 2023-09-01T19:57:34Z

Modules/_sre/sre.c

@@ -106,7 +106,7 @@ static const char copyright[] =
 #define SRE_IS_WORD(ch)\
    ((ch) <= 'z' && (Py_ISALNUM(ch) || (ch) == '_'))

-static unsigned int sre_lower_ascii(unsigned int ch)
+static unsigned int sre_lower_ascii(int ch)


This micro bugfix should get its own issue and PR.

No, the current code is correct.

I'll let you and Victor come to agreement on this.

vstinner · 2023-09-02T11:37:22Z

@erlend-aasland: Do you want to convert these macros? Or do you agree with @serhiy-storchaka that the change is bad since it introduces new compiler warnings, especially when the functions are "misused"?

These changes give a better defined API for these functions. They have the same API that their original <ctype.h functions:

       #include <ctype.h>

       int isalnum(int c);
       int isalpha(int c);
       int iscntrl(int c);
       int isdigit(int c);
       int isgraph(int c);
       int islower(int c);
       int isprint(int c);
       int ispunct(int c);
       int isspace(int c);
       int isupper(int c);
       int isxdigit(int c);

       int isascii(int c);
       int isblank(int c);

       int toupper(int c);
       int tolower(int c);

Input type: int
Output type: int

And yes, if you passed an unsigned int or some others types, you can get a new compiler warning. IMO for sre., the API was misused. But apparently, @serhiy-storchaka disagrees.

vstinner · 2023-09-02T11:45:07Z

I rebased my PR, change sre.c fix to keep unsigned int parameter type of sre_lower_ascii(), and fix 2 more warnings on Py_CHARMARK().

vstinner · 2023-09-02T11:45:22Z

Please review the updated PR.

erlend-aasland · 2023-09-03T21:10:18Z

I rebased my PR, change sre.c fix to keep unsigned int parameter type of sre_lower_ascii(), and fix 2 more warnings on Py_CHARMARK().

There's still compiler warnings ;)

Since these macros are exposed in Python.h, this change may also introduce new compiler warnings in external projects, so I'm not sure it is worth doing. There is no macro pitfalls here, so it's not an imperative change IMO.

Convert the following macros to static inline functions: * Py_ISLOWER() * Py_ISUPPER() * Py_ISALPHA() * Py_ISDIGIT() * Py_ISXDIGIT() * Py_ISALNUM() * Py_ISSPACE() * Py_TOLOWER() * Py_TOUPPER() * Py_CHARMASK() Changes: * sre_lower_ascii() now casts Py_TOLOWER() argument to "unsigned char" and cast the result to "unsigned int". * bytesobject.c and bytearrayobject.c now pass an "int" argument to Py_CHARMASK(), instead of a "Py_ssize_t" argument.

vstinner · 2023-09-03T21:48:54Z

There's still compiler warnings ;)

Oh! I missed that one. I fixed it and rebased my PR to the main branch (since I made other changes in main related to these functions, see commit 03c4080).

erlend-aasland · 2023-09-06T09:44:24Z

I don't mind converting these functions (I prefer static inline functions to macros in most cases anyway). I also agree that it is better to have a well-defined API (typed inputs and outputs instead of whatever goes in probably comes out). However, this may introduce compiler warnings in third-party code. I'm +0.5 :)

serhiy-storchaka · 2023-09-06T10:20:11Z

First, I have a problem with reading a sequence of several almost identical fragments of code. So the new code is unreadable to me. But this is my personal and minor issue. I do not expect to read this code often. It should not stop you.

Second, I do not see anything wrong with the current working code and do not understand why it should be changed, with the risk of introducing new bugs or fight new battles with compiler warnings. Macros are not an evil that should be avoided at all costs. In some cases, it is a suitable tool, and sometimes it is the only working tool.

I do not actively object to these changes. I only explain my point of view. You may change your mind if you look at it from a different perspective if you too had doubts.

vstinner · 2023-09-06T17:50:43Z

@serhiy-storchaka:

Second, I do not see anything wrong with the current working code and do not understand why it should be changed, with the risk of introducing new bugs or fight new battles with compiler warnings.

My concern is not avoid "Macro Pitfalls" but the API.

I have an different point of view: the current API is not well defined. It's unclear what is the argument type: is it signed? signed? is int accepted, or also long? Same question for the result. Is it signed or unsigned? Is it a char or an int?

You see the introduction of new compiler warnings as a regression. I see it as an intentional change, since the API was weakly defined (weakly "typed") :-)

If argument types and return type are well defined, there are less "undefined behaviors" depending on the types chosen by the C compiler depending how and where the macros/functions are used.

For me, it's like if (x < y) where x is signed and y is unsigned: which type will be chosen by the compiler? Which one is the "good one"? For me, "it depends", and the developer has to express their intent.

serhiy-storchaka · 2023-09-06T19:01:07Z

It's unclear what is the argument type: is it signed? signed? is int accepted, or also long? Same question for the result. Is it signed or unsigned? Is it a char or an int?

Py_CHARMASK() is designed to work with integer value in range from -128 to 255, independently of its type. Py_CHARMASK((some_signed_int)(signed char)c) and Py_CHARMASK((some_signed_or_unsigned_int)(unsigned char)c) should return the same. C does not support overloading, but if it was, it would overload Py_CHARMASK() for signed and unsigned types of different size.

Also, the new code may be less efficient, due to forcing conversion to int.

But I don't want to argue about it, so far I'm just -0.

vstinner · 2023-09-06T19:10:57Z

Thanks for the constructive discussion. I'm not going to push this change further. Ok, it's not worth it ;-)

vstinner added the skip news label Sep 1, 2023

bedevere-bot mentioned this pull request Sep 1, 2023

Avoid locale dependent <ctype.h> functions like isascii(), isdigit(), tolower() #108767

Closed

bedevere-bot added the awaiting core review label Sep 1, 2023

vstinner mentioned this pull request Sep 1, 2023

gh-108767: Replace ctype.h functions with pyctype.h functions #108772

Merged

erlend-aasland reviewed Sep 1, 2023

View reviewed changes

vstinner force-pushed the static_inline branch from 23eb55f to b6f958b Compare September 2, 2023 11:44

vstinner force-pushed the static_inline branch from d54e1ab to 9feff13 Compare September 3, 2023 21:48

vstinner closed this Sep 6, 2023

vstinner deleted the static_inline branch September 6, 2023 19:11

serhiy-storchaka mentioned this pull request Jan 14, 2024

Convert Py_ISDIGIT/ISXDIGIT/etc to inline functions? #114047

Open

skirpichev mentioned this pull request Jan 30, 2024

gh-114667: Support hexadecimal floating-point literals #114668

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-108767: Convert Py_ISLOWER() macro to static inline function #108770

gh-108767: Convert Py_ISLOWER() macro to static inline function #108770

vstinner commented Sep 1, 2023 •

edited by bedevere-bot

Loading

vstinner commented Sep 1, 2023

erlend-aasland commented Sep 1, 2023

vstinner commented Sep 1, 2023

vstinner commented Sep 1, 2023

vstinner commented Sep 1, 2023

serhiy-storchaka commented Sep 1, 2023

erlend-aasland Sep 1, 2023

serhiy-storchaka Sep 1, 2023

erlend-aasland Sep 3, 2023

vstinner commented Sep 2, 2023

vstinner commented Sep 2, 2023

vstinner commented Sep 2, 2023

erlend-aasland commented Sep 3, 2023

vstinner commented Sep 3, 2023

erlend-aasland commented Sep 6, 2023

serhiy-storchaka commented Sep 6, 2023

vstinner commented Sep 6, 2023

serhiy-storchaka commented Sep 6, 2023

vstinner commented Sep 6, 2023

gh-108767: Convert Py_ISLOWER() macro to static inline function #108770

gh-108767: Convert Py_ISLOWER() macro to static inline function #108770

Conversation

vstinner commented Sep 1, 2023 • edited by bedevere-bot Loading

vstinner commented Sep 1, 2023

erlend-aasland commented Sep 1, 2023

vstinner commented Sep 1, 2023

vstinner commented Sep 1, 2023

vstinner commented Sep 1, 2023

serhiy-storchaka commented Sep 1, 2023

erlend-aasland Sep 1, 2023

Choose a reason for hiding this comment

serhiy-storchaka Sep 1, 2023

Choose a reason for hiding this comment

erlend-aasland Sep 3, 2023

Choose a reason for hiding this comment

vstinner commented Sep 2, 2023

vstinner commented Sep 2, 2023

vstinner commented Sep 2, 2023

erlend-aasland commented Sep 3, 2023

vstinner commented Sep 3, 2023

erlend-aasland commented Sep 6, 2023

serhiy-storchaka commented Sep 6, 2023

vstinner commented Sep 6, 2023

serhiy-storchaka commented Sep 6, 2023

vstinner commented Sep 6, 2023

vstinner commented Sep 1, 2023 •

edited by bedevere-bot

Loading