Skip to content

C++: Fix expensive getWideCharType(). #8479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 23, 2022
Merged

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Mar 17, 2022

Fix a potential issue with FormattingFunction::getWideCharType that was identified in the 'Join-order badness' section of a nightly DCA run. The goals here is to fix the potential issue - I have not found or looked for projects where this problem blows up enough to cause a significant overall slowdown in practice.

The issue appears to come from getAFormatterWideTypeOrDefault, specifically not exists(getAFormatterWideType()) within that which creates a surprising amount of computation.

Before (wireshark, NoSpaceForZeroTerminator.ql):

[2022-03-17 14:58:13] (772s) Tuple counts for FormattingFunction::FormattingFunction::getWideCharType_dispred#bb/2@346c7dg5 after 74ms:
                      81      ~0%     {1} r1 = FormattingFunction::FormattingFunction::getDefaultCharType_dispred#bf#shared AND NOT FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#antijoin_rhs(Lhs.0 'this')
                      3315492 ~0%     {2} r2 = JOIN r1 WITH m#FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#mcpe_result CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'this'
                      162     ~1%     {2} r3 = JOIN r2 WITH FormattingFunction::getAFormatterWideTypeOrDefault#b ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result'
                      
                      81      ~1%     {2} r4 = JOIN FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#shared WITH m#FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#mcpe_result ON FIRST 1 OUTPUT Lhs.0 'result', Lhs.1 'this'
                      81      ~3%     {3} r5 = JOIN r4 WITH Type::Type::getSize_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1
                      0       ~0%     {3} r6 = SELECT r5 ON In.2 > 1
                      0       ~0%     {2} r7 = SCAN r6 OUTPUT In.0 'this', In.1 'result'
                      
                      162     ~1%     {2} r8 = r3 UNION r7
                                      return r8

After (wireshark, NoSpaceForZeroTerminator.ql):

[2022-03-17 15:20:20] (960s) Tuple counts for FormattingFunction::FormattingFunction::getWideCharType_dispred#bb/2@8e3a1db0 after 0ms:
                      162 ~0%       {2} r1 = JOIN FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#shared WITH FormattingFunction::FormattingFunction#class#f ON FIRST 1 OUTPUT Lhs.1 'result', Lhs.0 'this'
                      
                      162 ~0%       {2} r2 = r1 AND NOT FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#antijoin_rhs(Lhs.0 'result', Lhs.1 'this')
                      162 ~1%       {2} r3 = SCAN r2 OUTPUT In.1 'this', In.0 'result'
                      
                      162 ~0%       {2} r4 = r1 AND NOT FormattingFunction::FormattingFunction::getWideCharType_dispred#bb#antijoin_rhs(Lhs.0 'result', Lhs.1 'this')
                      162 ~1%       {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'
                      
                      324 ~103%     {2} r6 = r3 UNION r5
                      
                      81  ~1%       {2} r7 = JOIN FormattingFunction::FormattingFunction::getDefaultCharType_dispred#bf#shared WITH FormattingFunction::FormattingFunction::getFormatCharType_dispred#ff ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.0 'this'
                      81  ~1%       {2} r8 = JOIN r7 WITH m#Printf::FormatLiteral::getDefaultCharType_dispred#fb ON FIRST 1 OUTPUT Lhs.0 'result', Lhs.1 'this'
                      81  ~3%       {3} r9 = JOIN r8 WITH Type::Type::getSize_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1
                      
                      0   ~0%       {3} r10 = SELECT r9 ON In.2 > 1
                      0   ~0%       {2} r11 = SCAN r10 OUTPUT In.0 'this', In.1 'result'
                      
                      0   ~0%       {3} r12 = SELECT r9 ON In.2 > 1
                      0   ~0%       {2} r13 = SCAN r12 OUTPUT In.0 'this', In.1 'result'
                      
                      0   ~0%       {2} r14 = r11 UNION r13
                      324 ~103%     {2} r15 = r6 UNION r14
                                    return r15

I saw similar results on vim, and in some brief tests using other queries. I'll also start a DCA run on this PR shortly, and we'll see what the 'Join-order badness' section says now.

@geoffw0 geoffw0 added C++ no-change-note-required This PR does not need a change note labels Mar 17, 2022
@geoffw0 geoffw0 requested a review from a team as a code owner March 17, 2022 17:52
@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 18, 2022

DCA results:

  • there's a new alert; I'm not prepared to take this especially seriously given the QL changes in this PR, but perhaps I should re-run DCA to confirm its just a glitch?
  • analysis time is faster by a probably insignificant amount
  • nothing is listed in "Join-order badness" any more ... which is strange, I was expecting other results there to persist.

Conclusion: running DCA again.

@MathiasVP
Copy link
Contributor

MathiasVP commented Mar 18, 2022

  • there's a new alert; I'm not prepared to take this especially seriously given the QL changes in this PR, but perhaps I should re-run DCA to confirm its just a glitch?

That's without a doubt the variables-without-types issue. Nothing in this PR should be able to cause query changes, and this is the exact kind of spurious result change that this issue is causing.

I think we should disable the abseil project from DCA. This has been annoying us for a while.

  • nothing is listed in "Join-order badness" any more ... which is strange, I was expecting other results there to persist.

Did you start the DCA run with the --tuple-counting argument? It's available through the wizard. If you don't include this join-order analysis won't run.

MathiasVP
MathiasVP previously approved these changes Mar 18, 2022
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the change LGTM! Feel free to merge it once the DCA results come back.

@MathiasVP
Copy link
Contributor

Hm. On second thought. Isn't this just a case of bad magic? Would pragma[nomagic] also fix this issue?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 18, 2022

I think we should disable the abseil project from DCA. This has been annoying us for a while.

Sounds good to me. We're at the point where we have a good number of projects on there, but too many spurious results mostly from the same suspects.

Did you start the DCA run with the --tuple-counting argument? It's available through the wizard. If you don't include this join-order analysis won't run.

Thanks, I didn't realize that was necessary!

Hm. On second thought. Isn't this just a case of bad magic? Would pragma[nomagic] also fix this issue?

I'll try it now.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 18, 2022

Hm. On second thought. Isn't this just a case of bad magic? Would pragma[nomagic] also fix this issue?

Assuming you mean on the getAFormatterWideTypeOrDefault predicate, it does not appear to do the job.

@MathiasVP
Copy link
Contributor

Assuming you mean on the getAFormatterWideTypeOrDefault predicate, it does not appear to do the job.

Sorry. I mean on getWideCharType.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 21, 2022

Done - with that change FormattingFunction::FormattingFunction::getWideCharType_dispred#bb disappears from the log entirely (as we might expect) and I definitely feel the need to start a fresh DCA run...

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 22, 2022

DCA:

  • no interesting results
  • analysis time is 1% slower, which is probably not significant but it would be good to confirm...

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 22, 2022

Local testing: running cpp/too-many-format-arguments, cpp/no-space-for-terminator, cpp/wrong-number-format-arguments in sequence, I saw a 4% speedup for Wireshark, 3% slowdown for Linux. I think I've had enough of testing now.

@MathiasVP I'm not confident of my interpretation of the "Join order badness" on the DCA run (the latest one I ran on this PR). In the original run FormattingFunction::getWideCharType appeared multiple times in the third join order badness table, now it only appears in the first table. This might be because it only appears as a bad join order in the a (before) build of the new run and is fixed in the b (after) build???

@MathiasVP
Copy link
Contributor

MathiasVP commented Mar 23, 2022

@MathiasVP I'm not confident of my interpretation of the "Join order badness" on the DCA run (the latest one I ran on this PR). In the original run FormattingFunction::getWideCharType appeared multiple times in the third join order badness table, now it only appears in the first table. This might be because it only appears as a bad join order in the a (before) build of the new run and is fixed in the b (after) build???

I think your interpretation is correct, yes. I'm happy with this PR as soon as it's autoformatted 👍.

I'm kinda confused why there are three tables in the summary view. But that's not blocking this PR at all. The tuple counts look convincing enough to merge this, and I'm happy DCA helped us spot this!

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 23, 2022

Ah, I hadn't seen the format test failure. Fixing that now...

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 23, 2022

Done.

@MathiasVP MathiasVP merged commit 8b8f0ca into github:main Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants