Skip to content

Ruby: add def-nodes and separate method/return steps to API graphs #7819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 10, 2022

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Feb 2, 2022

Following up on a discussion with @hvitved and @aibaars, this makes some API graph changes in preparation for MaD.

Happy to wait until after #7663 and fix the merge conflicts here (though there are surprisingly few conflicts)

Changes to API graphs

  • The getReturn("foo") edge is now split into getMethod("foo").getReturn() through an intermediate node representing the method access itself. This node is always unique to the call site, so it can also also be used as a representative for the call itself (this differs from JS, where we need API::CallNode to work with calls in the API graph).
  • Adds edges going from the method access to its parameters. These have labels:
    • getParameter(n) for the nth positional parameter
    • getKeywordParameter(name) for the keyword parameter with that name
    • getBlock() for the block parameter
  • The instance edge label has been replaced with the path getMethod("new").getReturn(). We need a way to refer to the arguments of a constructor call, just like any other call, so we need the intermediate method/constructor access node to exist, and here it seemed better to unify the two representations rather than invent a "constructor access node". The getInstance() method in QL is still there, it's just the underlying graph that has changed.

NOTE: API graphs labels do not distinguish between "argument" and "parameter". This might look a bit confusing at first, and possibly made more sense when consuming an API graph from a library, where access paths in the library needed to match up with those in the application. Just mentioning it up front that this was deliberate and matches how we do it for JS.

Example

X.foo do |x|
  x.bar
end

You can get to x.bar via the path getMember("X").getMethod("foo").getBlock().getParameter(0).getMethod("bar").getReturn().

Explaining each step of the path:

  • getMember("X") gets the X
  • .getMethod("foo") gets the X.foo call (the call itself, not its return value)
  • .getBlock() gets the entire block argument (|x| ... end)
  • .getParameter(0) gets the first parameter of the block (x)
  • .getMethod("bar") gets the x.bar call
  • .getReturn() get the return value of the x.bar call

Drive-by fixes

There are two fixes to the data-flow libraries, found while debugging/pairing with @hvitved. I'm happy to split them out into a separate PR if necessary.

@asgerf asgerf added Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish Ruby labels Feb 2, 2022
/**
* Gets a data-flow node corresponding the value flowing into this API component.
*/
DataFlow::Node getARhs() { Impl::def(this, result) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you also add getAValueReachingRhs()?

@asgerf asgerf force-pushed the asgerf/ruby-def-nodes branch from dadfbcb to dbd4007 Compare February 3, 2022 13:28
MkUse(DataFlow::Node nd) { isUse(nd) }
MkUse(DataFlow::Node nd) { isUse(nd) } or
/** A value that escapes into an API at the node `nd` */
MkDef(DataFlow::Node nd) { isDef(nd) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me to understand the meaning of MkDef? I expected a Def to be a node representing a class or method definition, like def foo(x) or class Foo. But it looks like Def nodes are arguments to method calls. In what sense are these definitions? Are we planning to expand MkDef to cover more things in the future? What does it mean for a value to escape into an API as the comment says?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that "def node" is a confusing name without more context, so I've opened a discussion internally about whether to rename it.

But briefly: it's a place where a value flows from this codebase into another codebase.

What does it mean for a value to escape into an API as the comment says?

I've rephrased this as: A value that escapes into an external library at the node nd. Does that make more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This (plus the internal discussion) really helps me to get my head around it.

@asgerf asgerf force-pushed the asgerf/ruby-def-nodes branch from dbd4007 to 87c62db Compare February 4, 2022 10:27
@asgerf
Copy link
Contributor Author

asgerf commented Feb 4, 2022

Force-pushed to resolve conflicts with #7663.

The getMethod(name) predicate now performs getASubclass(). This affects both getReturn(name) and getInstance() so both are subclass-aware, preserving the behaviour from the previous PR.

@asgerf asgerf marked this pull request as ready for review February 4, 2022 11:08
@asgerf asgerf requested a review from a team as a code owner February 4, 2022 11:08
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really great! Thanks for helping out with this Asger.


/** Gets an API node representing the `n`th positional parameter. */
Node getParameter(int n) {
result = this.getASuccessor(Label::parameter(n)) and this.hasParameterIndex(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of hasParameterIndex if we instead update Label::parameter to

  bindingset[n]
  bindingset[result]
  string parameter(int n) {
    exists(string s |
      result = parameterByStr(s) and
      s = n.toString() and // bindingset[n]
      n = s.toInt() // bindingset[result]
    )
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The point was to avoid InverseAppend leaking out to the callers of getParameter and getKeywordParameter, which caused some problems in the JS libraries, so I've also added pragma[nomagic] on the two predicates for now.

@erik-krogh's change to IPA type labels was proven to be the most robust fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erik-krogh's change to IPA type labels was proven to be the most robust fix

I'll get on that once this PR is merged.
(Or I could push it straight onto this PR if you want, it shouldn't take long).

/**
* Gets a data-flow node corresponding the value flowing into this API component.
*/
DataFlow::Node getARhs() { Impl::def(this, result) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Rhs as in right-hand side? Right-hand side of what? This predicate seems to be getInducingNode specialised to just def nodes, but we don't use it in this PR. Is it necessary? Are there later MaD changes that rely on it maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary for MaD, and you are likely going to use it for future library modelling.

You can see some examples of how getARhs() is used in my Python PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. The name makes sense in the case of foo.bar = x as in the example in your PR. For method call parameters it is a bit of a weird name.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a DCA run before merging.

@asgerf
Copy link
Contributor Author

asgerf commented Feb 9, 2022

The evaluation shows OK performance, although it isn't without some cost. I've investigated the performance in opal and didn't see any bad join orders. There are roughly twice as many API nodes to track (2M up from 1M), so there's just more work to do.

The 8 new results were caused by 9a496e6

@asgerf asgerf removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 9, 2022
@codeql-ci codeql-ci merged commit a57ee01 into github:main Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants