Skip to content

C++: data flow through partial chi operands where type is known #2676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 29, 2020

Conversation

jbj
Copy link
Contributor

@jbj jbj commented Jan 22, 2020

This PR builds on top of #2667 and #2657, so I've initially opened it as a draft. The main change is in af8bbf4. As evidenced by the diffs, these changes together enable IR data flow through the most common case of definition by reference.

I've attempted to distinguish the unhelpful (for data flow) chi instructions from the helpful ones based on their result type. I thought originally that we'd need to thread some new predicates through the SSA construction, but the type-based approach seems to work. Maybe @rdmarsh2 or @dbartol can point out a case where it won't work.

jbj added 2 commits January 22, 2020 16:21
This allows adding a multi-line case without the auto-formatting changes
becoming too disruptive.
@jbj jbj added the C++ label Jan 22, 2020
@jbj jbj requested a review from rdmarsh2 January 22, 2020 16:12
@jbj jbj force-pushed the dataflow-partial-chi branch from af8bbf4 to 7376daf Compare January 22, 2020 16:13
@dbartol
Copy link
Contributor

dbartol commented Jan 22, 2020

The result type of the Chi is actually a pretty good way to distinguish these, since it will always be the type of the virtual variable. A virtual variable with a known type should always represent a single known memory location, and a virtual variable with an unknown type would represent a group of disjoint memory locations. There are probably some cases where the known memory location is an array or struct, where a write to a single member will result in a Chi for the whole array/struct, that might confuse this a bit, but I expect that isn't common enough to worry about.

@rdmarsh2
Copy link
Contributor

Won't stack-allocated structs normally be their own virtual variable? That makes this seem much more common to me.

@jbj
Copy link
Contributor Author

jbj commented Jan 22, 2020

Then a write to a field of a struct will cause data flow to the whole struct? That doesn't seem great. On the other hand, it's exactly what I'm trying to achieve for strcpy: a write to a portion of the output buffer should taint the whole buffer. So perhaps the code I'm introducing here belongs in TaintTrackingUtil rather than DataFlowUtil?

For data flow, perhaps we could require that the size of the partial operand equals the size of the chi result? I'm hoping that would support common cases of memcpy.

@jbj jbj marked this pull request as ready for review January 29, 2020 11:47
@jbj jbj requested review from a team as code owners January 29, 2020 11:47
Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected
@jbj
Copy link
Contributor Author

jbj commented Jan 29, 2020

I've fixed merge conflicts and taken this PR out of draft state. There's still the open question of whether the flow should be data flow, taint flow, or maybe sometimes one and sometimes the other. I don't think that matters for the January deadline, though.

@jbj
Copy link
Contributor Author

jbj commented Jan 29, 2020

It turns out this PR doesn't go far enough to include everything we'll need after #2696. See https://github.slack.com/archives/CP0LHP150/p1580312043150500.

This changes the flow to be taint rather than data flow, and it extends
it to include chi instructions with unknown type as long as they're not
for the `AliasedVirtualVariable`.

We're losing three good test results because these tests are not
affected by `DefaultTaintTracking.qll`. The taint step added here can
later be ported to `TaintTrackingUtil.qll` to recover these results, but
we probably want a better API than transitive-closure search through
instructions before doing that.
@jbj
Copy link
Contributor Author

jbj commented Jan 29, 2020

Solution pushed.

@jbj
Copy link
Contributor Author

jbj commented Jan 29, 2020

The last commit, 52d2beb, had no effect on the security qltests.

Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me for now. We'll need to come back to where it fits in the core dataflow library

@rdmarsh2 rdmarsh2 merged commit 37570c7 into github:master Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants