Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Add a new MemoryLocation to represent sets of Allocations #16907

Merged
merged 34 commits into from
Jul 17, 2024

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jul 4, 2024

The what

This PR is an alternative to #16465 that:

  • Fixes the problem of incorrect alias analysis that's causing FPs on the cpp/uninitialized-local query, and
  • Doesn't sacrifice analysis precision by making the alias analysis more conservative

The problem, as Dave explained very well in #16465, can be seen in this example:

int x;
int y;
int* p;
if (a < b) {
  p = &x;
} else {
  p = &y;
}
*p = 5;
use(x);

The question is: what is the possible values of x once we reach use(x)? The current (incorrect) alias analysis on main reports that the only possible value of x is Uninitialized - which is certainly wrong! The value may be uninitialized, but it may also be 5.

The how

In #16465 this was fixed by removing alias analysis support for Phi instructions. This meant that we would conflate any value that flows into a Phi instruction with all aliased memory which effectively means that we wouldn't be able to say anything about the memory. This was making some of our tests unhappy in #16465.

Instead of removing alias analysis support for Phi instructions, this PR adds a new MemoryLocation that represents a set of Allocations. That is, instead of saying that *p = 5 writes to all aliased memory we now have a memory location that represents a set of possible allocations. In the above case, that set of {x, y}.

Reviewing this PR

This PR is modifying code that we haven't touched in years so I've tried my best to split the changes into commits that can be reviewed independently.

I'm keeping this PR in draft for now as it depends on a yet-to-be-released feature of QL (i.e., the QlBuiltins::InternSets module. See 75c5d8f). Once 2.18.0 is out we can safely merge this PR onto main.

Analyzing the results

On samate there are some new inconsistencies arising from the missingPhiOperand check. In the first commit I've added a testcase that represents what's going on:

void use_int(int);

static void phi_with_single_input_at_merge(bool b)
{
  int *data = nullptr;
  if(b) {
    int intBuffer = 8;
    data = &intBuffer;
  }
  use_int(*data);
}

on main the value of *data has been merged into all aliased memory.

However, after this PR we track the flow from &intBuffer and into data, and into the phi instruction at the merge point. However, because we don't know what memory data points to initially there is only one phi input at the merge point.

I plan on fixing this unsoundness as a follow-up PR since I think this PR is large enough as-is. (I already have something for this locally.)

@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jul 4, 2024
@github-actions github-actions bot added C++ and removed no-change-note-required This PR does not need a change note labels Jul 4, 2024
@MathiasVP MathiasVP marked this pull request as ready for review July 15, 2024 09:10
@MathiasVP MathiasVP requested a review from a team as a code owner July 15, 2024 09:10
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jul 15, 2024
@MathiasVP
Copy link
Contributor Author

@geoffw0 950d70f renames the new instruction as we discussed at the sync. Hopefully that's less confusing now 🤞

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going over this with us in Zoom. I've added a few more comments here.

or
not isFirstInstructionBeforeUninitializedGroup(instruction, _) and
result = getInstructionSuccessorAfterUninitializedGroup0(instruction, kind)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction ordering code has accumulated a very high level of complexity for what it is, and I feel this is error prone (and difficult to maintain). I think we should discuss ways we can improve this in future, in particular the simple operation of appending one variable length block of instructions to another creates quite a lot of code / complexity at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be good if we could do something prettier here in the future. It's doable for now since we're only injecting new UninitializedGroup instructions and ChiInstructions (and because we know exactly the structure of these), but I agree that this could get unmanageable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually say it's already unmanageable, at least to my tolerance. I'm not suggesting we fix it here, I'll create an issue for discussion...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get started on figuring out a more sensible way to structure this as a follow-up PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't actually create the issue, but we discussed stuff on another channel)

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes, I'd like to know your thoughts about the DCA run:

  • 16 lost results for cpp/uninitialized-local.
  • possible 5% analysis slowdown.
  • changes in CPP IR inconsistencies, e.g. new missingPhiOperand inconsistencies.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jul 17, 2024

I'm happy with the changes, I'd like to know your thoughts about the DCA run:

  • 16 lost results for cpp/uninitialized-local.
  • possible 5% analysis slowdown.
  • changes in CPP IR inconsistencies, e.g. new missingPhiOperand inconsistencies.
  • I've verified that the 16 results are all FPs. They're exactly the kind of FP I expected to be removed by this PR, and 5d58cf6 is a perfect representation of the kinds of FPs that are removed 🎉

  • I checked the project with the largest slowdown (nlohmann__json), and I couldn't see anything in the log that points to anything going wrong. This project simply has some very bad aliasing behavior for the current analysis. This is also the project that was getting a lot slower when we made alias analysis sound a couple of months ago. I don't think it's a blocker, but we should figure out how to mitigate this worst-case situation at some point.

  • I added a testcase that represents the new missingPhiOperand inconsistencies in the first commit. See the Analyzing the results section of the PR description for an explanation. You can also see the inconsistency in the first commit that accepts test changes. I've got a PR in draft that solves these problems, but I didn't want to add it here since the changes were already fairly large as-is 😄 The new inconsistency won't have any effect on analysis quality right now anyway

@jketema
Copy link
Contributor

jketema commented Jul 17, 2024

I had a brief chat with Mathias about b185c67 and 72b52cc. I'm also happy with the PR, and agree that the instruction ordering bits are rather complex.

@jketema
Copy link
Contributor

jketema commented Jul 17, 2024

I'm assuming @geoffw0 will be the one that approves the PR once happy.

@MathiasVP
Copy link
Contributor Author

@geoffw0 based on our conversation I've added some more QLDoc in d5ccb2e that documents the control-flow transformations added in this PR. I hope that clears up things!

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this PR and the changes / new explanatory comment. 👍

Perhaps we should discuss analysis performance at one of the upcoming meetings, there have been a number of small regressions and improvements lately.

@MathiasVP MathiasVP merged commit 45ba0c3 into github:main Jul 17, 2024
15 checks passed
MathiasVP added a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants