Skip to content

Swift: use shared capture flow library #14078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

d10c
Copy link
Contributor

@d10c d10c commented Aug 28, 2023

The purpose of this PR is to enable closure capture flow using the shared dataflow library, as in #13478

@geoffw0
Copy link
Contributor

geoffw0 commented Aug 29, 2023

The above looks like a good plan. I like that you're starting with tests and have broken the work into parts, each of which looks quite manageable (though there will be implementation details to determine, especially in implementing VariableCapture::InputSig). Using Java as an example, checking the QLDoc in the shared library, and talking with the team when you're uncertain should help.

@aschackmull
Copy link
Contributor

A few more steps to consider:

  • SSA steps for captured variables are handled by the VariableCapture library, therefore any such steps that currently exists should be removed.
  • Ensure that library calls of the form someLibraryMethod(callbackFunction) gets suitable models (essentially Argument[n] to Argument[n].Parameter[this]). Java e.g. achieves this with a heuristic query. This ensures that callback that essentially "escape" into some unknown library call will be assumed to be called by that library function.

@d10c
Copy link
Contributor Author

d10c commented Aug 29, 2023

Also, TODO:

  • Add a test for self captures, just to see whether it works out of the box or not as a normal ParamDecl.

@@ -651,6 +664,136 @@ private module OutNodes {

import OutNodes

private predicate closureFlowStep(Expr e1, Expr e2) {
// simpleLocalFlowStep(exprNode(e1), exprNode(e2)) // TODO: find out why the java version uses simpleAstFlowStep... probably due to non-monotonic recursion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite the reason. We want a simple value flow relation that enables us to find the "interesting" aliases of a closure. We primarily want to use SSA steps to identify those aliases that arise from assignment to a variable. And we want to include steps like casts (those are the AST steps). For identifying the "interesting" flow points, we want to find those points that "escape", i.e. argument positions, return positions, and the like. If we use def-use rather than use-use, then we have a very simple way to achieve this, namely by returning those aliases that don't step any further. That allows the very simple implementation of hasAliasedAccess(Expr f) instead of having to somehow enumerate possibly escape points.

To understand why we want this from the hasAliasedAccess relation, we need to look at what the shared lib uses this for: There are two purposes. 1. update the value of the captured variables just-in-time before a closure escapes to ensure that it has the latest value. 2. observe side-effects when a closure is called - this requires that all the aliases that are used either directly as a call or occurs in an argument position of a call that may perform a callback are included in the hasAliasedAccess relation.


predicate hasAliasedAccess(Expr f) {
closureFlowStep+(this, f) and not closureFlowStep(f, _)
/* TODO: understand why this is intra-procedural */ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In java, the SSA def-use steps that are included in closureFlowStep include stepping into closures, so we'll also see captured aliases. And that's all we need, because we only care about aliases of a closure that occur in a (possibly nested) scope that has access to the same variables that the closure captures. Any inter-procedurally reachable data flow for the closure, which may be considered "aliases" are irrelevant if they no longer have the captured variables in scope.

n.(PostUpdateNode).getPreUpdateNode().asExpr() or
result.(CaptureFlow::ParameterNode).getParameter().(CapturedDecl).getDecl() = n.getParameter() or
result.(CaptureFlow::ThisParameterNode).getCallable().getSelfParam() = n.getParameter() or
result.(CaptureFlow::MallocNode).getClosureExpr() = n.getCfgNode().getNode().asAstNode() // TODO: figure out why the java version had PostUpdateNode logic here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a constructor call new Foo(x), the corresponding expression node is the value after the constructor has returned. And in order for a constructor to work as a setter (say if new Foo(x) assigns x to some field of Foo), then the constructor expression needs to be a post-update node. That's why we generally (not just in this context) synthesise a node that we call "malloc" to be the pre-update for a constructor call. That way the constructor becomes just another instance method taking the "malloc" as the input to the this parameter and returning the value of this after potentially updating some field on it.

The variable capture library needs to refer to this "malloc" node, which is the pre-update for a constructor call, in case the closure expression has a constructor that's being called when the closure is created. This usually isn't the case for lambdas, as lambdas don't have constructors, but if Swift allows the definition of other capturing objects, say local class definitions or anonymous class objects, then this becomes relevant. See e.g. the Java test cases, which also includes some examples of local and anonymous classes where the constructor captures a variable.

@d10c
Copy link
Contributor Author

d10c commented Sep 11, 2023

Thanks for your comments @aschackmull !

It seems that MallocNode does not correspond to anything in Swift, because closures in Swift do not have constructors, and that seems to be the defining part of MallocNode. Will it still work if we just ignore the existence of MallocNode?

@aschackmull
Copy link
Contributor

It seems that MallocNode does not correspond to anything in Swift, because closures in Swift do not have constructors, and that seems to be the defining part of MallocNode. Will it still work if we just ignore the existence of MallocNode?

Short answer: Yes, at least in most cases.
The longer answer depends on a number of questions that I don't know the answer to, as I'm not familiar with Swift.
Presumably Swift can define classes and have constructors for those classes. And those constructors presumably can have side-effects, such that e.g. an argument to a constructor is stored in the resulting object. That sort of side-effect flow will usually require a pre-update-post-update relationship for the implicit-instance-argument/explicit-instance-result in the constructor call. And that pre-update node is called MallocNode in several other languages QL data flow setups. So I'm guessing that Swift very much also needs the concept of a MallocNode, but that's completely independent from this PR.
The second question is about the grammatical structure of Swift and the scope of variables: Is it possible to define a class somewhere where a local variable is in scope? And does that then mean that the class can reference that variable? If so, then you likely have a capture situation for an object that isn't a lambda, and the constructor of such an object may have side-effects on the captured variable. It's certainly a corner-case, but one that does occur with varying frequency between different languages - I'd say occasionally in Java and probably frequently in JavaScript.

@geoffw0
Copy link
Contributor

geoffw0 commented Sep 14, 2023

Will it still work if we just ignore the existence of MallocNode?

Short answer: Yes, at least in most cases.

I read this answer as "You will want to have MallocNode - but the majority of cases will work without it (so worry about them first)".

Though there is a regression in the tests, so more work is needed.
However, this doesn't change any of the test results.
@geoffw0
Copy link
Contributor

geoffw0 commented Oct 27, 2023

Closing.

The initial closure captures work has now been merged, in #14577 , which supersedes this PR (and I believe includes most of the actual commits / changes from here).

@MathiasVP there are some unchecked TODOs in the PR description and comments above, would you mind briefly checking we haven't missed anything important.

@geoffw0 geoffw0 closed this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants