Skip to content

Conversation

khwilliamson
Copy link
Contributor

The maximum number of bytes in a Perl extended UTF-8 character is 13 on ASCII platforms; 14 on EBCDIC. Yet the variable that returns that number is a Size_t. By adding these clues to these inline functions, the compiler may be able to do some optimizations.

This isn't done here on another inline function, utf8_to_uv_msgs(). That is because it currently returns the call of a non-inline function, so the ASSUME would be unreachable code. I don't know if that actually matters. Or that function's boolean result could be stored in a temporary the ASSUME done, and then utf8_to_uv_msgs() would return the temporary's value.

Opinions welcome

  • This set of changes does not require a perldelta entry.

Copy link
Contributor

@tonycoz tonycoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if assuming expectlen >= 1 (etc) is useful.

@khwilliamson
Copy link
Contributor Author

Revised to do check that the value is 1<= x <= MAX

@khwilliamson
Copy link
Contributor Author

@tonycoz Any ideas on the second paragraph of #23680 (comment)

@tonycoz
Copy link
Contributor

tonycoz commented Sep 5, 2025

This isn't done here on another inline function, utf8_to_uv_msgs(). That is because it currently returns the call of a non-inline function, so the ASSUME would be unreachable code. I don't know if that actually matters. Or that function's boolean result could be stored in a temporary the ASSUME done, and then utf8_to_uv_msgs() would return the temporary's value.

I think there's some value in:

bool result = somfunc(...advancep);
ASSUME(advancep == NULL || inRANGE(...));
return result;

also, while -fanalyze -flto is impractical, -flto is practical, so ASSUME()s in non-inline functions can have some value.

The maximum number of bytes in a Perl extended UTF-8 character is 13 on
ASCII platforms; 14 on EBCDIC.  Yet the variable that returns that
number is a Size_t in the cases changed by this commit.  By adding these
ASSUMES to these functions, the compiler may be able to do some
optimizations.

I looked through the code base, and found no other instances where such
a small value could be stored in a fully wide variable.

With link time optimization, an ASSUME may be helpful even in non-inline
functions.
@khwilliamson
Copy link
Contributor Author

I added an ASSUME in a place where it would only help link time optimization, and changed as suggested in #23680 (comment).

I also audited the code base for other potential spots to change, and found none.

@khwilliamson khwilliamson merged commit b60f610 into Perl:blead Sep 6, 2025
33 checks passed
@khwilliamson khwilliamson deleted the ASSUME branch September 6, 2025 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants