Skip to content

C++/C#: Alias analysis of indirect parameters #2696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 30, 2020

Conversation

dbartol
Copy link
Contributor

@dbartol dbartol commented Jan 27, 2020

This change enables more precise alias analysis the memory pointed to by pointer and reference parameters. At a high level, we now treat the memory pointed to by the value of a pointer/reference parameter to be its own unaliased allocation, independent of all other allocations. This is unsound, but produces much better analysis results in practice.

Overall, IR diffs (in aliased SSA) are what I would expect. Stores to an indirect parameter no longer pollute the all aliased memory virtual variable. Even though we don't know how big any given indirect parameter buffer is, if all accesses to that buffer are just scalars at offset zero, we'll still hook up uses to exact definitions just as if we knew the size all along.

The memory accessed by the InitializeIndirection and ReturnIndirection instructions now has a new MemoryAccessKind: EntireAllocationMemoryAccess. This kind represents an access to the entire contiguous allocation, without knowing exactly how big that allocation is.

Alias analysis now operates on Allocations, which are provided as module arguments to represent the memory locations to be tracked. In unaliased SSA, an Allocation is just an IRAutomaticVariable. In aliased SSA, there is an Allocation for each IRVariable, plus one for each indirect parameter.

Aliased SSA still works mostly the same, except that a new type of MemoryLocation has been added to represent an access to the entire Allocation. The overlap computation has been updated accordingly.

The existing AliasAnalysis.qll was shared between IR stages in C++, but was not automatically shared with C#. I brought the two languages in sync, so that now they will be kept up to date as they change in the future. This required adding a couple more predicates to the language interface. C# did not already have the same "models" system as C++, so for now, I implemented a trivial AliasFunction class that never actually has any object instances.

I've implemented a new module for writing analysis tests called InlineExpectationTests.qll. You extend InlineExpectationsTest to specify what results you found on each line. Each result has a tag and a value, both strings. The tags and values are automatically compared to comments in the code under test to determine which results are expected, which are unexpected, and which are missing. The output file only shows missing and unexpected results, so a passing test should have an empty .expected file. There's also a way to annotate a particular result as being either a false positive or false negative. This results in a different message if the actual result doesn't match the expected. For example, if the actual output is missing a result that was declared as a false positive, the output message reports that a false positive was fixed.

The new points_to.ql test uses the above module.

@dbartol dbartol marked this pull request as ready for review January 28, 2020 00:58
@dbartol dbartol requested review from a team as code owners January 28, 2020 00:58
@dbartol dbartol requested review from jbj and rdmarsh2 January 28, 2020 00:58
Instead of tracking `IRVariable`s directly, alias analysis now tracks instances of the `Allocation` type provided by its `Configuration` parameter. For unaliased SSA, an `Allocation` is just an `IRAutomaticVariable`. For aliased SSA, an `Allocation` is either an `IRVariable` or the memory pointed to by an indirect parameter.
This introduces a new type of `MemoryLocation`: `EntireAllocationMemoryLocation`, representing an entire contiguous allocation whose size is not known. This is used to model the memory accesses on `InitializeIndirection` and `ReturnIndirection`.
@dbartol dbartol force-pushed the dbartol/Indirections branch from 5f585c5 to 542579d Compare January 28, 2020 18:35
@dbartol
Copy link
Contributor Author

dbartol commented Jan 28, 2020

I've rebased onto master, and curated the PR into several commits. The commits aren't expected to work correctly on their own, but they should at least separate all the changes into related chunks for easier review.

Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main code changes and the IR dump and dataflow test changes look good to me. I'm not sure I fully understand the new test framework.

Comment on lines 46 to 50
private string expectationCommentPattern() { result = "//\\s*(\\$(?:[^/]|/[^/])*)(?://.*)?" }

private string expectationPattern() {
result = "(?:(f(?:\\+|-)):)?((?:[A-Za-z-_]+)(?:\\s*,\\s*[A-Za-z-_]+)*)(?:=(.*))?"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes need explanations, maybe in the class header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've documented each of them individually (with examples in the module header as part of the "how to use this" docs).

@@ -0,0 +1,168 @@
import cpp

abstract class InlineExpectationsTest extends string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some qldoc explaining what this class is for and how to override it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a huge module comment with details and examples.

@dbartol
Copy link
Contributor Author

dbartol commented Jan 29, 2020

@rdmarsh2 I think I've addressed all of your feedback.

@rdmarsh2 rdmarsh2 self-requested a review January 29, 2020 20:34
@rdmarsh2
Copy link
Contributor

sync-identical-files failed because #2713 touched the C++ copies of AliasAnalysis.qll

rdmarsh2
rdmarsh2 previously approved these changes Jan 30, 2020
Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/AliasedSSA.qll
	csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/internal/AliasAnalysis.qll
@dbartol dbartol requested a review from rdmarsh2 January 30, 2020 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants