Fix use-of-uninitialized-value with ??= on assert #11581

iluuu1994 · 2023-07-03T16:38:41Z

Normally, PHP evaluates all expressions in offsets (property or array), as well as the right hand side of assignments before actually fetching the offsets. This is well explained in this blog post.

https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety

For ??= we have a bit of a problem in that the rhs must only be evaluated if the lhs is null or undefined. Thus, we have to first compile the lhs with BP_VAR_IS, conditionally run the rhs and then re-fetch the lhs with BP_VAR_W to to make sure the offsets are valid if they have been invalidated.

However, we don't want to just re-evaluate the entire lhs because it may contain side-effects, as in $array[$x++] ??= 42;. In this case, we don't want to re-evaluate $x++ because it would result in writing to a different offset than was previously tested. The same goes for function calls, like $array[foo()] ??= 42;, where the second call to foo() might result in a different value. PHP behaves correctly in these cases. This is implemented by memoizing sub-expressions in the lhs of ??= and reusing them when compiling the lhs for the second time. This is done for any expression that isn't a variable, i.e. anything that can (potentially) be written to.

Unfortunately, this also means that function calls are considered writable due to their return-by-reference semantics, and will thus not be memoized. The expression foo()['bar'] ??= 42; will invoke foo() twice. Even worse, foo(bar()) ??= 42; will call both foo() and bar() twice, but foo(bar() + 1) ??= 42; will only call foo() twice. This is likely not by design, and was just overlooked in the implementation. The RFC does not specify how function calls in the lhs of the coalesce assignment behaves. This should probably be improved in the future.

Now, the problem this commit actually fixes is that ??= may memoize expressions inside assert() function calls that may not actually execute. This is not only an issue when using the VAR in the second expression (which would usually also be skipped) but also when freeing the VAR. For this reason, it is not safe to memoize assert() sub-expressions.

There are two possible solutions:

Don't memoize any sub-expressions of assert(), meaning they will execute twice.
Throw a compile error.

Option 2 is not quite simple, because we can't disallow all memoization inside assert(), as that would break assertions like assert($array[foo()] ??= 'bar');. Code like this is highly unlikely (and dubious) but possible. In this case, we would need to make sure that a memoized value could not be used across the assert boundary it was created in. The complexity for this is not worthwhile. So we opt for option 1 and disable memoization immediately inside assert().

Fixes GH-11580

Normally, PHP evaluates all expressions in offsets (property or array), as well as the right hand side of assignments before actually fetching the offsets. This is well explained in this blog post. https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety For ??= we have a bit of a problem in that the rhs must only be evaluated if the lhs is null or undefined. Thus, we have to first compile the lhs with BP_VAR_IS, conditionally run the rhs and then re-fetch the lhs with BP_VAR_W to to make sure the offsets are valid if they have been invalidated. However, we don't want to just re-evaluate the entire lhs because it may contain side-effects, as in $array[$x++] ??= 42;. In this case, we don't want to re-evaluate $x++ because it would result in writing to a different offset than was previously tested. The same goes for function calls, like $array[foo()] ??= 42;, where the second call to foo() might result in a different value. PHP behaves correctly in these cases. This is implemented by memoizing sub-expressions in the lhs of ??= and reusing them when compiling the lhs for the second time. This is done for any expression that isn't a variable, i.e. anything that can (potentially) be written to. Unfortunately, this also means that function calls are considered writable due to their return-by-reference semantics, and will thus not be memoized. The expression foo()['bar'] ??= 42; will invoke foo() twice. Even worse, foo(bar()) ??= 42; will call both foo() and bar() twice, but foo(bar() + 1) ??= 42; will only call foo() twice. This is likely not by design, and was just overlooked in the implementation. The RFC does not specify how function calls in the lhs of the coalesce assignment behaves. This should probably be improved in the future. Now, the problem this commit actually fixes is that ??= may memoize expressions inside assert() function calls that may not actually execute. This is not only an issue when using the VAR in the second expression (which would usually also be skipped) but also when freeing the VAR. For this reason, it is not safe to memoize assert() sub-expressions. There are two possible solutions: 1. Don't memoize any sub-expressions of assert(), meaning they will execute twice. 2. Throw a compile error. Option 2 is not quite simple, because we can't disallow all memoization inside assert(), as that would break assertions like assert($array[foo()] ??= 'bar');. Code like this is highly unlikely (and dubious) but possible. In this case, we would need to make sure that a memoized value could not be used across the assert boundary it was created in. The complexity for this is not worthwhile. So we opt for option 1 and disable memoization immediately inside assert(). Fixes phpGH-11580

Girgias

This makes sense to me.

Maybe if we were introducing assert() now and being aware of this issue we would throw a compile error immediately. Maybe this is something we can think about doing in the next major?

However, I think this is reasonable for the time being.

iluuu1994 · 2023-07-05T16:58:11Z

Compile error if the result of assert is used you mean?

Girgias · 2023-07-05T17:06:51Z

Compile error if the result of assert is used you mean?

That could be interesting, but I was more saying of doing option 2 that you were proposing, by throwing a compile error on any memoization inside assert()

iluuu1994 · 2023-07-06T07:36:38Z

Nowadays I would probably just make assert a statement. There's not much use in interacting with it's return value. I could imagine it being used in an arrow function, or match arm, but not sure if those are worth supporting.

Girgias · 2023-07-06T15:47:24Z

Yes, that's kinda what I want to do after having cleaned up the assert INI settings, as that's kinda preventing making it a statement.

Girgias approved these changes Jul 5, 2023

View reviewed changes

iluuu1994 closed this in 84a2e48 Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix use-of-uninitialized-value with ??= on assert #11581

Fix use-of-uninitialized-value with ??= on assert #11581

Uh oh!

iluuu1994 commented Jul 3, 2023

Uh oh!

Girgias left a comment

Uh oh!

iluuu1994 commented Jul 5, 2023

Uh oh!

Girgias commented Jul 5, 2023

Uh oh!

iluuu1994 commented Jul 6, 2023

Uh oh!

Girgias commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix use-of-uninitialized-value with ??= on assert #11581

Fix use-of-uninitialized-value with ??= on assert #11581

Uh oh!

Conversation

iluuu1994 commented Jul 3, 2023

Uh oh!

Girgias left a comment

Choose a reason for hiding this comment

Uh oh!

iluuu1994 commented Jul 5, 2023

Uh oh!

Girgias commented Jul 5, 2023

Uh oh!

iluuu1994 commented Jul 6, 2023

Uh oh!

Girgias commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants