cmd/compile: feedback-guided optimization #28262

josharian · 2018-10-17T23:34:57Z

This is an umbrella issue for supporting feedback-guided optimization. This has been discussed offhand in a few other contexts (inlining decisions: #17566, code layout decisions: #20356, language changes: #24204, stack growth: #18138).

It is not clear what the design for FGO/PGO support might look like. In particular, what do the feedback/profile files look like?

@aclements observed during conversation at GopherCon 2018 that in order to preserve reproducible builds and efficiency, the feedback file needs to be hashed by cmd/go along with all other inputs, which suggests that it ought to be committed to version control.

cc @randall77 @ianlancetaylor @aclements @crawshaw @bradfitz @CAFxX (and the list could go on, but I'm sure everyone will find it)

seebs · 2019-08-29T18:41:39Z

Oh, hey, I have a concrete example of a potentially-relevant case for this, which is a recurring thing that bites people with microbenchmarks, but may also affect real code:

In rare cases, trivial questions of code alignment can produce very large performance differences. I had a test case, at one point, where I had two functions which were getting roughly a 2x performance difference in benchmarking. (As in, one took twice as long as the other.) But the more I experimented, the more confused I became, because it turned out the functions were identical, and I could change which one was faster by reordering them in source.

The microbenchmark case probably doesn't matter, but there are real-world cases where a couple of likely-concurrent hot paths end up with some kind of problem which is entirely a function of the exact location of their code -- I'm guessing either "aligned or not aligned with cache line" or "in same cache-associativity group as other hot paths", but I don't have the expertise to dive into it far enough to be sure.

My guess is that there's some actual specialized code out there where inserting a handful of nops before some specific functions could produce a +/-20% performance change in the program as a whole. Figuring out which ones seems likely to be arbitrarily hard; you'd need some kind of framework for figuring out how to benchmark the functions, and how to benchmark different sets of functions in parallel, and so on.

This is the sort of thing which is a huge amount of work, and which doesn't provide any noticeable benefit until it does, and then it can be inordinately significant. (The big question is whether you end up perceiving it as "sometimes we get lucky and our code runs faster" or "sometimes we get unlucky and our code runs slower").

dangit815 · 2021-10-18T13:53:37Z

More than two years have passed since the last discussion, is there any progress or plan on FGO recently?

aclements · 2021-10-18T14:31:00Z

I hadn't remembered we even had an open issue for this!

@cherrymui and @prattmic are actively doing prototype work right now to help us understand the potential wins from PGO/FGO/FDO and how to best integrate it into the build process. It's early stages, but we're tentatively hoping to have initial PGO optimizations available in 1.19.

josharian · 2021-10-18T19:19:56Z

Here's a related idea I've been kicking around.

Instead of doing running program -> profile -> compiler, add an extra intermediate step: running program -> profile -> optimization file -> compiler. The optimization file consists of optimization information and directives to the compiler.

There are a few advantages to this:

the compiler can stay simpler
being able to inspect and edit the optimizations file directly (or via a tool) will make it easier to understand what is happening, bisect, test, and minimize issues
it opens the door for other mechanisms/tooling to add information and optimizations, such as: offline static analysis, offline human analysis, bespoke profiling techniques, or combining profiles taken across a fleet of machines

josharian · 2021-10-18T19:21:46Z

@cherrymui and @prattmic are actively doing prototype work right now to help us understand the potential wins from PGO/FGO/FDO and how to best integrate it into the build process

Though I am delighted to hear this, I feel compelled to say yet again: I wish we had a more predictable, repeatable mechanism to learn things like this than someone happening to ping a relevant issue.

thepudds · 2021-10-18T19:26:49Z

Though I am delighted to hear this, I feel compelled to say yet again: I wish we had a more predictable, repeatable mechanism to learn things like this than someone happening to ping a relevant issue.

Is this the type of thing that could get covered in the “Compiler and Runtime Meeting Notes“ tracking issue #43930 ?

That did seem useful, although I know it takes some work.

prattmic · 2021-10-18T19:42:13Z

Instead of doing running program -> profile -> compiler, add an extra intermediate step: running program -> profile -> optimization file -> compiler. The optimization file consists of optimization information and directives to the compiler.

I have received similar feedback in discussing this with one user: that they would like a way to manually specify optimizations that a PGO profile might indicate if capturing actual profiles is not possible/representative or via, as you say "offline human analysis". They specifically mentioned //go: directives, but I think a separate edit-able file would work as well.

jeremyfaller · 2021-11-05T17:44:38Z

Is this the type of thing that could get covered in the “Compiler and Runtime Meeting Notes“ tracking issue #43930 ?

That did seem useful, although I know it takes some work.

CC @josharian

Yeah. I haven't been maintaining that of late. My apologies. (It's kinda fallen on the pile of "stuff I should get to".) I'll try to do better. :)

Having said that it's unlikely that PGO would have made that tracking issue. PGO started in experimental form, and we were only trying to see if it was worth pursuing. Our experimentation and the results from some external groups lead us to believe PGO might make the 2022 roadmap, but we're still trying to figure out what form it will take.

cristaloleg · 2021-11-08T21:43:56Z

Regarding PGO in the latest .NET 6 release https://devblogs.microsoft.com/dotnet/announcing-net-6/#dynamic-pgo and https://devblogs.microsoft.com/dotnet/announcing-net-6/#dynamic-pgo (UPD: there is a 2nd Dynamic PGO paragraph closer to the end of the article)

mvdan · 2021-11-22T11:15:58Z

To echo #28262 (comment), #49688 came as a surprise to seemingly everyone except the compiler team. If #43930 isn't the right medium to keep everyone else in the loop, perhaps consider trying something else. My gut feeling is that those public weekly notes are a good starting point, and I'd like to see them resume - especially if they included a very brief summary of the main projects currently in progress.

To hopefully give a bit of perspective: the lack of transparency into what is being actively considered or experimented on discourages external contributions. Using this thread as an example, it seems like Josh has been thinking about this idea for a while, and I'd bet he would contribute towards the design, testing, and perhaps even coding/reviews. That kind of external involvement simply isn't going to happen if there isn't a public way to be kept in the loop, though.

ianlancetaylor · 2021-11-22T19:21:51Z

@mvdan I'm sorry it seems that there is a lack of transparency by the Go team. The compiler team at Google hasn't done any work on PGO that is not public. The code in #49688 was developed entirely independently at Uber. The compiler team only became aware of it relatively recently when Uber reached out privately. The public #49688 is a step toward transparency and keeping everybody in the loop.

It would have been better to make a public announcement about it first. Still, we would now be in the same position anyhow. The communication on this is a lesson to bear in mind the next time there is a substantial privately developed contribution. Sorry for the confusion and angst.

mvdan · 2021-11-23T16:39:53Z

Perhaps I'm not seeing the full picture - in #28262 (comment) it was described that some prototype work was already taking place, and in #49688 (comment) it seems like the design and coordination with Uber to upstream their work is currently underway.

These updates are certainly welcome, and I'm really excited to see some progress in this space. However, I'd like to clarify that I don't think we'd have arrived at the same position anyway. To once again take Josh as an example, he's only learning about developments around this proposal via comments that feel like after-thoughts, leaving him little room to participate or even be kept in the loop with some detail. Which is unfortunate, given he's the author of the proposal and some of its design ideas, and has been one of the most reliable compiler contributors over the years :)

Maybe the way the compiler team functions is indeed to work on this issue privately, and then come out publicly with a proposal document and possibly even a prototype implementation. I'm trying to argue, hopefully in a constructive way, that we should aim to be more transparent and welcoming to external contributors. A good first step would be to post regular and early updates somewhere like #43930, making it easy for others to be kept up to date in a timely manner. Ideally some of the team's communication channels would be public too, much like golang-tools has been doing for years, but at least weekly written updates are an easier incremental step.

jeremyfaller · 2021-11-23T18:49:38Z

@mvdan I don't think you're missing the picture – there are a few details that could help fill you in, but most of it is all current events.

A bit of background:
The Go Compiler and Runtime team is split between two locations, Cambridge MA (with a small contingent in NYC), and Sunnyvale. The majority of the generics work has been handled in Sunnyvale, and while Go focused on 1.18, Cambridge decided to minimize instability in the compiler and runtime. So, we decided to do some experimentation. While lots of times work is internally motivated by things Google would like to see from Go (like the register ABI in 1.16/1.17/1.18), Google's happy if we just focus on generics right now. CAM decided to spend a small amount of resources to characterize what PGO would give us, but we didn't know if we would go further than experimentation. (and we still don't)

The present day:
About when Cherry started to get numbers from PGO inlining, Uber reached out about a PGO implementation they have. They have been thinking about it for longer than we have, and implemented more than we have with our recent experiments. Uber has expressed that they want to upstream their work, but we don't know it's the right thing for the Go ecosystem as a whole, so we need to take a look at it. I will be honest and say that there are parts of it weren't not sure are a good fit for cloud Go users and embedded Go users, but at first blush lots of decisions they've made sound good. We still need to see what they've got before we make any decisions. I doubt their work will be upstreamed without modification.

That is where we are. Uber wants us to consider their inlining implementation, and we need to take a look at it.

The feedback that the compiler and runtime team could do a better job is correct, but in this instance, I feel like we communicated when we knew things. The PGO work we undertook in 2021H2 was truly an experiment. We do experimentation all the time, and when we can share results (ie, it isn't based on internal Google data), we do.

I hope this explanation helps, and I'm happy to chat at any time.

mvdan · 2021-11-28T10:01:45Z

Thanks, Jeremy - I appreciate the extra updates and context on PGO in this thread, as well as all the work that is happening here.

The Go Compiler and Runtime team

Just so we're clear, this implies only Google, correct? I think that's what I'm trying to get at. People like Josh certainly deserve being considered part of the compiler team, but are repeatedly kept out of relevant threads and video calls simply because they are not employed by Google :)

I admit this is getting off-topic, though, so this will be my last comment on the topic here. I've already outlined how I think you could improve transparency and encourage more significant contributions from non-Googlers, and I'd rather not repeat myself as I don't have much else to add.

I hope it's clear this is not an anti-Google sentiment in any way - but rather, how other companies (like Josh's Tailscale or this thread's Uber) would likely participate more regularly and actively in those teams if they were given an honest chance.

jeremyfaller · 2021-11-29T16:46:29Z

Just so we're clear, this implies only Google, correct? I think that's what I'm trying to get at. People like Josh certainly deserve being considered part of the compiler team, but are repeatedly kept out of relevant threads and video calls simply because they are not employed by Google :)

Sorry, yes. That's how we split things up within google. These are the people within Google who think about the C&RT, don't mean to sound like we're alone in this fight. :)

gopherbot · 2022-03-24T20:39:47Z

Change https://go.dev/cl/357330 mentions this issue: cmd/compile: implement jump tables

Performance is kind of hard to exactly quantify. One big difference between jump tables and the old binary search scheme is that there's only 1 branch statement instead of O(n) of them. That can be both a blessing and a curse, and can make evaluating jump tables very hard to do. The single branch can become a choke point for the hardware branch predictor. A branch table jump must fit all of its state in a single branch predictor entry (technically, a branch target predictor entry). With binary search that predictor state can be spread among lots of entries. In cases where the case selection is repetitive and thus predictable, binary search can perform better. The big win for a jump table is that it doesn't consume so much of the branch predictor's resources. But that benefit is essentially never observed in microbenchmarks, because the branch predictor can easily keep state for all the binary search branches in a microbenchmark. So that benefit is really hard to measure. So predictable switch microbenchmarks are ~useless - they will almost always favor the binary search scheme. Fully unpredictable switch microbenchmarks are better, as they aren't lying to us quite so much. In a perfectly unpredictable situation, a jump table will expect to incur 1-1/N branch mispredicts, where a binary search would incur lg(N)/2 of them. That makes the crossover point at about N=4. But of course switches in real programs are seldom fully unpredictable, so we'll use a higher crossover point. Beyond the branch predictor, jump tables tend to execute more instructions per switch but have no additional instructions per case, which also argues for a larger crossover. As far as code size goes, with this CL cmd/go has a slightly smaller code segment and a slightly larger overall size (from the jump tables themselves which live in the data segment). This is a case where some FDO (feedback-directed optimization) would be really nice to have. #28262 Some large-program benchmarks might help make the case for this CL. Especially if we can turn on branch mispredict counters so we can see how much using jump tables can free up branch prediction resources that can be gainfully used elsewhere in the program. name old time/op new time/op delta Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10) Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9) Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9) Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10) Fixes #5496 Update #34381 Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f Reviewed-on: https://go-review.googlesource.com/c/go/+/357330 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>

prattmic · 2022-08-24T19:54:27Z

@mvdan posted a comment at #49688 (comment) asking for a tracking bug, and updates on what's going on. This is the right tracking bug.

Apologies for lack of updates, our progress has been very on-and-off with other things going on. The current status here is:

@rajbarik and others at Uber have been prototyping compiler internals of a few different types of optimizations that we could do with profiling data. Their prototype is in pgo: profile-guided optimization for golang v1.17 #49688.
@cherrymui, myself, and @aclements have been thinking about what an interface to PGO could look like in terms of profile collection and use via the go tool. We hope to have a concrete initial proposal to post in the next few weeks, which will be a good discussion point and hopefully can accelerate progress around collaboration on optimizations, etc.

cherrymui · 2022-09-12T16:29:53Z

The proposal @prattmic mentioned above is filed at #55022 . (Filing a new issue to make it clearer what we are proposing.)
Thanks.

qiulaidongfeng · 2023-10-03T02:03:15Z

Go already supports PGO, why not close this issue?

cherrymui · 2023-10-03T14:47:38Z

The initial PGO work is done at #55022, and some followup works are tracked at #62463. We can close this issue.

josharian added the Performance label Oct 17, 2018

josharian added this to the Unplanned milestone Oct 17, 2018

josharian mentioned this issue Feb 2, 2020

proposal: Go 2: compiler chosen constants - make(const, <min>, <max>) #36972

Closed

seankhliao added the umbrella label Jan 25, 2021

mvdan mentioned this issue Nov 20, 2021

pgo: profile-guided optimization for golang v1.17 #49688

Closed

heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 22, 2021

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

mknyszek added this to Go Compiler / Runtime Jul 13, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

cherrymui mentioned this issue Sep 12, 2022

cmd/compile: profile-guided optimization #55022

Closed

cherrymui closed this as completed Oct 3, 2023

qiulaidongfeng mentioned this issue Jan 11, 2024

access: go-assignees and go-editors #65058

Closed

golang locked and limited conversation to collaborators Oct 2, 2024

gopherbot added the FrozenDueToAge label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: feedback-guided optimization #28262

cmd/compile: feedback-guided optimization #28262

josharian commented Oct 17, 2018

seebs commented Aug 29, 2019

dangit815 commented Oct 18, 2021

aclements commented Oct 18, 2021

josharian commented Oct 18, 2021

josharian commented Oct 18, 2021

thepudds commented Oct 18, 2021 •

edited

Loading

prattmic commented Oct 18, 2021

jeremyfaller commented Nov 5, 2021

cristaloleg commented Nov 8, 2021 •

edited

Loading

mvdan commented Nov 22, 2021

ianlancetaylor commented Nov 22, 2021 •

edited

Loading

mvdan commented Nov 23, 2021

jeremyfaller commented Nov 23, 2021

mvdan commented Nov 28, 2021

jeremyfaller commented Nov 29, 2021

gopherbot commented Mar 24, 2022

prattmic commented Aug 24, 2022

cherrymui commented Sep 12, 2022

qiulaidongfeng commented Oct 3, 2023

cherrymui commented Oct 3, 2023

cmd/compile: feedback-guided optimization #28262

cmd/compile: feedback-guided optimization #28262

Comments

josharian commented Oct 17, 2018

seebs commented Aug 29, 2019

dangit815 commented Oct 18, 2021

aclements commented Oct 18, 2021

josharian commented Oct 18, 2021

josharian commented Oct 18, 2021

thepudds commented Oct 18, 2021 • edited Loading

prattmic commented Oct 18, 2021

jeremyfaller commented Nov 5, 2021

cristaloleg commented Nov 8, 2021 • edited Loading

mvdan commented Nov 22, 2021

ianlancetaylor commented Nov 22, 2021 • edited Loading

mvdan commented Nov 23, 2021

jeremyfaller commented Nov 23, 2021

mvdan commented Nov 28, 2021

jeremyfaller commented Nov 29, 2021

gopherbot commented Mar 24, 2022

prattmic commented Aug 24, 2022

cherrymui commented Sep 12, 2022

qiulaidongfeng commented Oct 3, 2023

cherrymui commented Oct 3, 2023

thepudds commented Oct 18, 2021 •

edited

Loading

cristaloleg commented Nov 8, 2021 •

edited

Loading

ianlancetaylor commented Nov 22, 2021 •

edited

Loading