Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-108767: Convert Py_ISLOWER() macro to static inline function #108770

Closed
wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 1, 2023

Convert the following macros to static inline functions:

  • Py_ISLOWER()
  • Py_ISUPPER()
  • Py_ISALPHA()
  • Py_ISDIGIT()
  • Py_ISXDIGIT()
  • Py_ISALNUM()
  • Py_ISSPACE()
  • Py_TOLOWER()
  • Py_TOUPPER()
  • Py_CHARMASK()

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

@erlend-aasland: A bunch of PEP 670 candies :-)

@erlend-aasland
Copy link
Contributor

This generates -Wsign-compare and downcast warnings; it is not as easy as it looks like ;)

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

This generates -Wsign-compare and downcast warnings; it is not as easy as it looks like ;)

It reminds me PyUnicode_Kind(obj) which gave me headaches (and then nightmakes!). It ended up by ... adding a comment to explain to me why I didn't convert it :-)

// PyUnicode_KIND(): Return one of the PyUnicode_*_KIND values defined above.
//
// gh-89653: Converting this macro to a static inline function would introduce
// new compiler warnings on "kind < PyUnicode_KIND(str)" (compare signed and
// unsigned numbers) where kind type is an int or on
// "unsigned int kind = PyUnicode_KIND(str)" (cast signed to unsigned).
#define PyUnicode_KIND(op) _Py_RVALUE(_PyASCIIObject_CAST(op)->state.kind)

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

I confirm that building Python with gcc -Wsign-compare emits a new compiler warning:

./Modules/_sre/sre.c: In function 'sre_lower_ascii':
./Modules/_sre/sre.c:111:26: warning: operand of '?:' changes signedness from 'int' to 'unsigned int' due to unsignedness of other operand [-Wsign-compare]
  111 |     return ((ch) < 128 ? Py_TOLOWER(ch) : ch);
      |                          ^~~~~~~~~~~~~~

Tested: GCC 13.2.1 on Fedora 38.

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

Modules/_sre/sre.c pass an unigned int, whereas the API doesn't seem to be designed for unsigned int:

/* Argument must be a char or an int in [-128, 127] or [0, 255]. */
#define Py_CHARMASK(c) ((unsigned char)((c) & 0xff))

The must word is scary!

@serhiy-storchaka
Copy link
Member

I do not think this should be done. The new code is larger and much less readable (to me). And it creates new problems to fight!

@@ -106,7 +106,7 @@ static const char copyright[] =
#define SRE_IS_WORD(ch)\
((ch) <= 'z' && (Py_ISALNUM(ch) || (ch) == '_'))

static unsigned int sre_lower_ascii(unsigned int ch)
static unsigned int sre_lower_ascii(int ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This micro bugfix should get its own issue and PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the current code is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let you and Victor come to agreement on this.

@vstinner
Copy link
Member Author

vstinner commented Sep 2, 2023

@erlend-aasland: Do you want to convert these macros? Or do you agree with @serhiy-storchaka that the change is bad since it introduces new compiler warnings, especially when the functions are "misused"?

These changes give a better defined API for these functions. They have the same API that their original <ctype.h functions:

       #include <ctype.h>

       int isalnum(int c);
       int isalpha(int c);
       int iscntrl(int c);
       int isdigit(int c);
       int isgraph(int c);
       int islower(int c);
       int isprint(int c);
       int ispunct(int c);
       int isspace(int c);
       int isupper(int c);
       int isxdigit(int c);

       int isascii(int c);
       int isblank(int c);

       int toupper(int c);
       int tolower(int c);
  • Input type: int
  • Output type: int

And yes, if you passed an unsigned int or some others types, you can get a new compiler warning. IMO for sre., the API was misused. But apparently, @serhiy-storchaka disagrees.

@vstinner
Copy link
Member Author

vstinner commented Sep 2, 2023

I rebased my PR, change sre.c fix to keep unsigned int parameter type of sre_lower_ascii(), and fix 2 more warnings on Py_CHARMARK().

@vstinner
Copy link
Member Author

vstinner commented Sep 2, 2023

Please review the updated PR.

@erlend-aasland
Copy link
Contributor

I rebased my PR, change sre.c fix to keep unsigned int parameter type of sre_lower_ascii(), and fix 2 more warnings on Py_CHARMARK().

There's still compiler warnings ;)

Since these macros are exposed in Python.h, this change may also introduce new compiler warnings in external projects, so I'm not sure it is worth doing. There is no macro pitfalls here, so it's not an imperative change IMO.

Convert the following macros to static inline functions:

* Py_ISLOWER()
* Py_ISUPPER()
* Py_ISALPHA()
* Py_ISDIGIT()
* Py_ISXDIGIT()
* Py_ISALNUM()
* Py_ISSPACE()
* Py_TOLOWER()
* Py_TOUPPER()
* Py_CHARMASK()

Changes:

* sre_lower_ascii() now casts Py_TOLOWER() argument to "unsigned
  char" and cast the result to "unsigned int".
* bytesobject.c and bytearrayobject.c now pass an "int" argument to
  Py_CHARMASK(), instead of a "Py_ssize_t" argument.
@vstinner
Copy link
Member Author

vstinner commented Sep 3, 2023

There's still compiler warnings ;)

Oh! I missed that one. I fixed it and rebased my PR to the main branch (since I made other changes in main related to these functions, see commit 03c4080).

@erlend-aasland
Copy link
Contributor

I don't mind converting these functions (I prefer static inline functions to macros in most cases anyway). I also agree that it is better to have a well-defined API (typed inputs and outputs instead of whatever goes in probably comes out). However, this may introduce compiler warnings in third-party code. I'm +0.5 :)

@serhiy-storchaka
Copy link
Member

First, I have a problem with reading a sequence of several almost identical fragments of code. So the new code is unreadable to me. But this is my personal and minor issue. I do not expect to read this code often. It should not stop you.

Second, I do not see anything wrong with the current working code and do not understand why it should be changed, with the risk of introducing new bugs or fight new battles with compiler warnings. Macros are not an evil that should be avoided at all costs. In some cases, it is a suitable tool, and sometimes it is the only working tool.

I do not actively object to these changes. I only explain my point of view. You may change your mind if you look at it from a different perspective if you too had doubts.

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

@serhiy-storchaka:

Second, I do not see anything wrong with the current working code and do not understand why it should be changed, with the risk of introducing new bugs or fight new battles with compiler warnings.

My concern is not avoid "Macro Pitfalls" but the API.

I have an different point of view: the current API is not well defined. It's unclear what is the argument type: is it signed? signed? is int accepted, or also long? Same question for the result. Is it signed or unsigned? Is it a char or an int?

You see the introduction of new compiler warnings as a regression. I see it as an intentional change, since the API was weakly defined (weakly "typed") :-)

If argument types and return type are well defined, there are less "undefined behaviors" depending on the types chosen by the C compiler depending how and where the macros/functions are used.

For me, it's like if (x < y) where x is signed and y is unsigned: which type will be chosen by the compiler? Which one is the "good one"? For me, "it depends", and the developer has to express their intent.

@serhiy-storchaka
Copy link
Member

It's unclear what is the argument type: is it signed? signed? is int accepted, or also long? Same question for the result. Is it signed or unsigned? Is it a char or an int?

Py_CHARMASK() is designed to work with integer value in range from -128 to 255, independently of its type. Py_CHARMASK((some_signed_int)(signed char)c) and Py_CHARMASK((some_signed_or_unsigned_int)(unsigned char)c) should return the same. C does not support overloading, but if it was, it would overload Py_CHARMASK() for signed and unsigned types of different size.

Also, the new code may be less efficient, due to forcing conversion to int.

But I don't want to argue about it, so far I'm just -0.

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

Thanks for the constructive discussion. I'm not going to push this change further. Ok, it's not worth it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants