-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Ruby: add def-nodes and separate method/return steps to API graphs #7819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/** | ||
* Gets a data-flow node corresponding the value flowing into this API component. | ||
*/ | ||
DataFlow::Node getARhs() { Impl::def(this, result) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you also add getAValueReachingRhs()
?
dadfbcb
to
dbd4007
Compare
MkUse(DataFlow::Node nd) { isUse(nd) } | ||
MkUse(DataFlow::Node nd) { isUse(nd) } or | ||
/** A value that escapes into an API at the node `nd` */ | ||
MkDef(DataFlow::Node nd) { isDef(nd) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me to understand the meaning of MkDef
? I expected a Def
to be a node representing a class or method definition, like def foo(x)
or class Foo
. But it looks like Def
nodes are arguments to method calls. In what sense are these definitions? Are we planning to expand MkDef
to cover more things in the future? What does it mean for a value to escape into an API
as the comment says?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that "def node" is a confusing name without more context, so I've opened a discussion internally about whether to rename it.
But briefly: it's a place where a value flows from this codebase into another codebase.
What does it mean for a value to
escape into an API
as the comment says?
I've rephrased this as: A value that escapes into an external library at the node nd
. Does that make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This (plus the internal discussion) really helps me to get my head around it.
dbd4007
to
87c62db
Compare
Force-pushed to resolve conflicts with #7663. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great! Thanks for helping out with this Asger.
|
||
/** Gets an API node representing the `n`th positional parameter. */ | ||
Node getParameter(int n) { | ||
result = this.getASuccessor(Label::parameter(n)) and this.hasParameterIndex(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of hasParameterIndex
if we instead update Label::parameter
to
bindingset[n]
bindingset[result]
string parameter(int n) {
exists(string s |
result = parameterByStr(s) and
s = n.toString() and // bindingset[n]
n = s.toInt() // bindingset[result]
)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The point was to avoid InverseAppend
leaking out to the callers of getParameter
and getKeywordParameter
, which caused some problems in the JS libraries, so I've also added pragma[nomagic]
on the two predicates for now.
@erik-krogh's change to IPA type labels was proven to be the most robust fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erik-krogh's change to IPA type labels was proven to be the most robust fix
I'll get on that once this PR is merged.
(Or I could push it straight onto this PR if you want, it shouldn't take long).
/** | ||
* Gets a data-flow node corresponding the value flowing into this API component. | ||
*/ | ||
DataFlow::Node getARhs() { Impl::def(this, result) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Rhs
as in right-hand side? Right-hand side of what? This predicate seems to be getInducingNode
specialised to just def nodes, but we don't use it in this PR. Is it necessary? Are there later MaD changes that rely on it maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary for MaD, and you are likely going to use it for future library modelling.
You can see some examples of how getARhs()
is used in my Python PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. The name makes sense in the case of foo.bar = x
as in the example in your PR. For method call parameters it is a bit of a weird name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a DCA run before merging.
The evaluation shows OK performance, although it isn't without some cost. I've investigated the performance in opal and didn't see any bad join orders. There are roughly twice as many API nodes to track (2M up from 1M), so there's just more work to do. The 8 new results were caused by 9a496e6 |
Following up on a discussion with @hvitved and @aibaars, this makes some API graph changes in preparation for MaD.
Happy to wait until after #7663 and fix the merge conflicts here (though there are surprisingly few conflicts)
Changes to API graphs
getReturn("foo")
edge is now split intogetMethod("foo").getReturn()
through an intermediate node representing the method access itself. This node is always unique to the call site, so it can also also be used as a representative for the call itself (this differs from JS, where we needAPI::CallNode
to work with calls in the API graph).getParameter(n)
for then
th positional parametergetKeywordParameter(name)
for the keyword parameter with thatname
getBlock()
for the block parameterinstance
edge label has been replaced with the pathgetMethod("new").getReturn()
. We need a way to refer to the arguments of a constructor call, just like any other call, so we need the intermediate method/constructor access node to exist, and here it seemed better to unify the two representations rather than invent a "constructor access node". ThegetInstance()
method in QL is still there, it's just the underlying graph that has changed.NOTE: API graphs labels do not distinguish between "argument" and "parameter". This might look a bit confusing at first, and possibly made more sense when consuming an API graph from a library, where access paths in the library needed to match up with those in the application. Just mentioning it up front that this was deliberate and matches how we do it for JS.
Example
You can get to
x.bar
via the pathgetMember("X").getMethod("foo").getBlock().getParameter(0).getMethod("bar").getReturn()
.Explaining each step of the path:
getMember("X")
gets theX
.getMethod("foo")
gets theX.foo
call (the call itself, not its return value).getBlock()
gets the entire block argument (|x| ... end
).getParameter(0)
gets the first parameter of the block (x
).getMethod("bar")
gets thex.bar
call.getReturn()
get the return value of thex.bar
callDrive-by fixes
There are two fixes to the data-flow libraries, found while debugging/pairing with @hvitved. I'm happy to split them out into a separate PR if necessary.