Skip to content

Conversation

@eric-therond
Copy link
Contributor

No description provided.

@eric-therond
Copy link
Contributor Author

Hi @nikic, @ircmaxell
Do you have time to review this PR?
Regards

@ircmaxell
Copy link
Owner

So, one issue: technically now every sub-block within the try-catch needs to keep a reference back to the wrapping try in-case there’s an exception.

In essence:

try {
    if ($a) {
        doFoo();
    }
} …

The block for doFoo() would need to maintain a new reference back to the “wrapping try” so we know statically where to go up to. Likewise for all functions, methods, and include calls (so each file would also need some sort of reference). So maybe adding an interface TryContextOp (or something similar), which is implemented by your Try_ class, Script, and by Func. Then every created Block gets the context of the closest parent in the call train (a Try, Func, or Script)… (and the TryContextOp would need to keep track of the parent context (if it exists) so bubbling can be understood.

Where this gets tricky is when blocks can be entered by multiple paths (via goto, etc), the parser needs to be careful to attach the TryContextOp of the code parent, not the CFG entry point (which may or may not be the same).

In an analysis context, seeing as every expression (well, most) could theoretically throw an exception, when analyzing code flow, the expression would look to the block it resides in for the context. Which raises the question: should the Op or Expr keep a reference to the Block it is a member of, or to the nearest TryContextOp. Or is that not needed since in an execution or analysis context you’ll always have a reference to the Block (not positive if this is true, any thoughts)???

@ircmaxell
Copy link
Owner

ircmaxell commented Aug 3, 2025

The issue is:

$a = false;
try {
    foo:
    echo 1;
    if ($a) {
        throw Exception(1234);
    }
} catch (Exception $e) {
    return true; 
}
try {
    $a = true;
    goto foo;
} catch (Exception $e) {
    return false;
}

That should return true, but we need to maintain the try context so the throw knows where control will be passed to. Otherwise if it walked up the stack of Blocks (the way Phi does) it could get erroneous catch linkages (and hence statically determine it’s possible to return false)…

@eric-therond
Copy link
Contributor Author

Thanks for the detailed answer.
This is an interesting case.

Is this really specific to the try-catch statement?
For example, this code always return true:

$a = true;
if($a)
    goto a;

return false;
 
a:
return true;

But statically and only with the CFG, we don't know this, unless we track the value of $a and perform a much more complex control flow analysis. For instance and regarding the throw statement, we also need to track the type of exceptions thrown and cached if we want to get back to the correct block when the throw statement is encountered.

@ircmaxell
Copy link
Owner

The difference is that with the information currently contained in the CFG, you could transform that via constant propagation passes, and reduce it down fully to just return true; and remove the unreachable blocks. In general this may not be possible for multiple reasons, but the important bit is all the information needed for the analysis is codified into the blocks themselves, so all operations are just shortcutting the edges of the graph.

With try/catch, there is a subtle difference in that it's actually overlaying a separate graph on top. Without that separate graph, there's no real way of knowing where an exception would bubble. Without try/catch support that doesn't matter in practice since you could just walk back up to the wrapping Func/Script and you're done. But once you have try/catch that walking now has potential intermediaries it needs to visit.

And we definitely need to track the type of exceptions. I don't know if we can track thrown types generally, but we should be able to track caught types generally (your current Catch_ implementation does that well)...

@eric-therond
Copy link
Contributor Author

I will take this code as an example to better understand the objective:

try {
  doFoo();
} 
catch(ExceptionA $a) {
} 

try {
  doFoo();
} 
catch(ExceptionB $b) {
} 

function doFoo() {
  throw new ExceptionB("b");
}

Generated cfg for the first try-catch statement (with the current PR implementation):

Block#1
    Stmt_Try
        body: Block#2
        catch: Block#3
        finally: Block#4
    Terminal_Return

Block#2
    Parent: Block#1
    Expr_Funccall
        expr: LITERAL('doFoo')

Block#3
    Parent: Block#1
    Stmt_Catch
        type[0]: LITERAL('ExceptionA')
        var: Var#1<$a>
        body: Block#5

Block#4
    Parent: Block#1

Block#5
    Parent: Block#3

Generated cfg for the second try-catch statement (with the current PR implementation):

    Stmt_Try
        body: Block#6
        catch: Block#7
        finally: Block#8
    Terminal_Return

Block#6
    Parent: Block#1
    Expr_Funccall
        expr: LITERAL('doFoo')

Block#7
    Parent: Block#1
    Stmt_Catch
        type[0]: LITERAL('ExceptionB')
        var: Var#2<$b>
        body: Block#9

Block#8
    Parent: Block#1

Block#9
    Parent: Block#7

then for the doFoo function,
the goal is to have, let's say a jump statement(s) following the throw terminal to get back to the correct block(s) ?
in this case, the block associated to the body of the second catch statement (Block#9).

Function 'doFoo': mixed
Block#1
    Expr_New
        class: LITERAL('ExceptionB')
        result: Var#1
    Terminal_Throw
        expr: Var#1
    Stmt_Jump
        target: Block#9

Am I correct?

@ircmaxell
Copy link
Owner

So, this turned out to be much deeper than I expected. I created #102 to attempt an implementation. There are multiple issues with it, but it should give you an idea of the direction and what's needed.

<?php
try {
    if ($foo) {
        doBar();
    } else {
        doBaz();
    }
} catch (Exception $e) {
    return $e;
} finally {
    return 'finally';
}

Gives us

Screenshot 2025-08-05 at 10 45 44 PM

And

Block#1
    Stmt_Try
        catch<LITERAL('Exception')>(Var#1<$e>): Block#2
        finally: Block#3
        body: Block#4

Block#2
    Parent: Block#1
        finallyTarget: Block#3
    Var#2<$e> = Phi(Var#1<$e>)
    Terminal_Return
        expr: Var#2<$e>

Block#3
    Parent: Block#5
    Parent: Block#6
    Terminal_Return
        expr: LITERAL('finally')

Block#4
    Parent: Block#1
        catchTarget<LITERAL('Exception')>(Var#1<$e>): Block#2
        finallyTarget: Block#3
    Stmt_JumpIf
        cond: Var#3<$foo>
        if: Block#5
        else: Block#6

Block#5
    Parent: Block#4
        catchTarget<LITERAL('Exception')>(Var#1<$e>): Block#2
        finallyTarget: Block#3
    Expr_FuncCall
        name: LITERAL('doBar')
        result: Var#4
    Stmt_Jump
        target: Block#3

Block#6
    Parent: Block#4
        catchTarget<LITERAL('Exception')>(Var#1<$e>): Block#2
        finallyTarget: Block#3
    Expr_FuncCall
        name: LITERAL('doBaz')
        result: Var#5
    Stmt_Jump
        target: Block#3

Note: I know Traverser is broken as it won't traverse any Blocks inside the catchTarget. If you can clean up the new PR and update, I'd be happy to continue iterating. I won't have much time from here though to write and finish the implementation, but hopefully this gets you started...

@eric-therond
Copy link
Contributor Author

eric-therond commented Aug 7, 2025

Thank you, I think I understand, except for this sentence: "I know Traverser is broken as it won't traverse any Blocks inside the catchTarget". I can clean a bit the PR but I don't have the necessary permissions on the repository.

This information about catch/finally targets certainly helps, but is it really the CFG's responsibility to offer this kind of "semantic information"?

My use case may be very specific, but in the case of interprocedural analysis on top of the CFG:

<?php

try {
    doFoo();
} catch (Exception $e) {
    return $e;
} finally {
    return 'finally';
}

function doFoo() {
    throw new Exception('foo');
}

Obviously, the catchTarget and finallyTarget properties are missing from the doFoo function blocks. In this case, the user is unaware of the reason. He anyway has to implement the context propagation mechanism itself to know, for example, if an exception is thrown but not caught, or where it is caught.

@ircmaxell
Copy link
Owner

This information about catch/finally targets certainly helps, but is it really the CFG's responsibility to offer this kind of "semantic information"?

I think so (but am open to other ideas), since control flow is determined by semantics of try/catch/finally, so without encoding the semantics analysis can be incorrect (not just in edge cases like exceptions, but even in semantic cases). Take the following code:

function foo() {
    try {
        doFoo();
        return 10;
    } catch (SomeSpecificException $e) {
        return 20;
    } finally {
        return 30;
    }
    return 40;

In this code, the return is 100% always guaranteed to be 30. We only know that because the return 10 and return 20 are "within" the final scope, so we know it will get intercepted.

The catch target is likely less useful for most analysis types, but there are definitely some where it's going to be handy. Finally is definitely useful though.

Obviously, the catchTarget and finallyTarget properties are missing from the doFoo function blocks. In this case, the user is unaware of the reason. He anyway has to implement the context propagation mechanism itself to know, for example, if an exception is thrown but not caught, or where it is caught.

That only really matters if inlining doFoo() into the parent code (at which point the inlining process would need to implement context propagation). If not, the propagation is still a runtime concern (or a dynamic one). The reason is that doFoo() can be called from multiple places, so it's more up to the analyzer to wire it together than the CFG to do that.

In your interprocedural analysis case, you'd need to keep track of what can throw (literally anything with ErrorExceptions) and then wire together. But the key is you only need to keep track of the potential call graphs, and only walk those call graph edges backwards to determine where a catch or finally would occur.

I can clean a bit the PR but I don't have the necessary permissions on the repository.

I'd suggest just throwing away my PR. Maybe fork from it and continue to develop and update this one?

@ircmaxell
Copy link
Owner

ircmaxell commented Aug 7, 2025

Thank you, I think I understand, except for this sentence: "I know Traverser is broken as it won't traverse any Blocks inside the catchTarget".

The problem is that Try_ doesn't include the catch target blocks in getSubBlocks(), so the traverser can't know about the catch blocks to process them in post-processing steps.

Looking at it deeper, realistically this should likely be refactored to have getSubBlocks() return the blocks themselves rather than the names of properties. That's a much bigger change though.

Also, Thinking about it a bit more, I think each of the catch blocks and finally blocks should have every block which has them as a catchtarget as a parent so that controlflow is modeled correctly (phi nodes, etc).

And yeah, this gets super complex super quickly, which is why it wasn't implemented before. But this is definitely getting close. Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants