Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: stack reuse for aggregate data #62077

Closed
jinlin-bayarea opened this issue Aug 16, 2023 · 7 comments
Closed

cmd/compile: stack reuse for aggregate data #62077

jinlin-bayarea opened this issue Aug 16, 2023 · 7 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@jinlin-bayarea
Copy link
Contributor

We propose adding support for aggregate data in the stack reuse to the Go compiler. The Go 1.20 compiler cannot handle the local array effectively in the stack allocation. Consider an example as follows.

package simple

type Field struct {
	Integer   int64
	String    string
	Interface interface{}
}

func a(i int8) Field {
	return Field{}
}

func b(i int16) Field {
	return Field{}
}

func c(i int32) Field {
	return Field{}
}

//Simple does
func Simple(i interface{}) Field {
	switch val := i.(type) {
	case int8:
		return a(val)
	case int16:
		return b(val)
	case int32:
		return c(val)
	default:
		return Field{}
	}
}

The corresponding disassembly is as below.

        0x01a5 00421 (simple.go:29)	MOVQ	AX, command-line-arguments..autotmp_12+48(SP)
	0x01aa 00426 (simple.go:29)	MOVQ	BX, command-line-arguments..autotmp_12+56(SP)
	0x01af 00431 (simple.go:29)	MOVQ	CX, command-line-arguments..autotmp_12+64(SP)
	0x01b4 00436 (simple.go:29)	MOVQ	DI, command-line-arguments..autotmp_12+72(SP)
	0x01b9 00441 (simple.go:29)	MOVQ	SI, command-line-arguments..autotmp_12+80(SP)

In the example provided, it is evident that a distinct temporary stack variable is allocated to store the returned Field from the callees, ensuring that they do not share the same stack memory. However, this approach is suboptimal due to the live ranges of these variables are not overlapped. Each temporary occupies its own exclusive space, leading to the necessity of sharing the stack slot. To achieve optimal code, the allocation should only cater to the maximum size among all possible temporaries across all switch cases, rather than summing up their individual sizes.

Multiple compilers for C/C++ and other programming languages have the capability to reuse the stack for scalar variables and/or arrays. For instance, LLVM's stack coloring pass allows for the reuse of the stack memory for local arrays.

In summary, we propose adding stack reuse for aggregate data in the go compiler. Inputs are welcome.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 16, 2023
@cherrymui
Copy link
Member

Yeah, currently the compiler doesn't reuse stack slots for named variables and some temporaries. It might be a little trickier for data containing pointers, due to GC metadata. But (as you mentioned) for scalars, it should be fine.

for aggregate data

I don't think whether it is aggregate matters. We could reuse stack slots for a local variable of type int (non-aggregate), as well as [3]int or struct{ a, b int }.

Do you have a sense how much it helps in real code? It will reduce some stack sizes, but I'm not sure how much it matters in practice.

@cherrymui cherrymui added this to the Unplanned milestone Aug 16, 2023
@rabbbit
Copy link

rabbbit commented Aug 16, 2023

uber-go/zap#1310 implemented a fix for this that reduced stack usage for zap.Any from 4856 to 192 bytes.

Due to stack goroutine stack growth/copying this was enough to notice a visible impact on profiles (~5% of total CPU in some workloads)

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 17, 2023
@rabbbit
Copy link

rabbbit commented Aug 18, 2023

yarpc/yarpc-go#2220 also hit a similar issue:

TLDR; Reduce stack usage from the rpc handler function from 2520 bytes to 608 bytes.

call object was allocated 3 times on the stack.

A large refactoring was initially proposed to avoid this, but in the end the fix was to reduce the size of call object.

@thanm
Copy link
Contributor

thanm commented Apr 11, 2024

Hello again, circling back to take a look at this issue in light of current work that we've been doing to address the problem of stack space consumption by local vars.

You mention this assembly code:

        0x01a5 00421 (simple.go:29)	MOVQ	AX, command-line-arguments..autotmp_12+48(SP)
	0x01aa 00426 (simple.go:29)	MOVQ	BX, command-line-arguments..autotmp_12+56(SP)
	0x01af 00431 (simple.go:29)	MOVQ	CX, command-line-arguments..autotmp_12+64(SP)
	0x01b4 00436 (simple.go:29)	MOVQ	DI, command-line-arguments..autotmp_12+72(SP)
	0x01b9 00441 (simple.go:29)	MOVQ	SI, command-line-arguments..autotmp_12+80(SP)

I don't seem to see anything like this when I compile your example with Go 1.20:

$ which go
/w/go1.20/bin/go
$ go build -gcflags=-S ./p.go 2>&1 | egrep '(autotmp|STEXT)'
command-line-arguments.a STEXT nosplit size=55 args=0x8 locals=0x30 funcid=0x0 align=0x0
command-line-arguments.b STEXT nosplit size=55 args=0x8 locals=0x30 funcid=0x0 align=0x0
command-line-arguments.c STEXT nosplit size=55 args=0x8 locals=0x30 funcid=0x0 align=0x0
command-line-arguments.Simple STEXT nosplit size=266 args=0x10 locals=0x30 funcid=0x0 align=0x0
type:.eq.command-line-arguments.Field STEXT dupok size=151 args=0x10 locals=0x20 funcid=0x0 align=0x0
$

There is a local var named .autotmp_12 at one point but it gets deleted early on in the back end. As far as I can tell the only thing on the stack is a single Field sized blob used for the return pseudo-var ~r0.

The most recent release Go as well as tip also look good in this respect.

I am going to close this out now given that I don't see a problem; if there are other more specific cases you want to point out, please feel free to reopen.

Also note that there are some CL's going into Go tip that will help with merging the stack slots of disjointly accessed local variables, specifically the stack rooted at CL 577615. Once all of these are in we will have a better story to tell about stack slot reuse.

@rabbbit
Copy link

rabbbit commented Apr 11, 2024

Hey @thanm, thanks!

The easiest "real" reproduction might be uber-go/zap@249507a.

Gotip shows a 20* locals reduction on gotip - very nice, thanks! The workaround is still 50% smaller, but that's much less impactful now.

❯ go version
go version go1.22.1 darwin/arm64
❯ gotip version
go version devel go1.23-890179d9 2024-04-11 10:09:10 -0700 darwin/arm64

(in the zap repo)

❯ git checkout 50b2db4fdecf067666874bd2e323ae601cf32559        <<<<====== commit just before the fix
HEAD is now at 50b2db4 zap.Any add benchmarks
❯ go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
go.uber.org/zap.Any STEXT size=17552 args=0x20 locals=0x12e8 funcid=0x0 align=0x0         <<<==== locals are 4840
❯ gotip build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
go.uber.org/zap.Any STEXT size=11808 args=0x20 locals=0x128 funcid=0x0 align=0x0.  <<<<<==== locals are 296 bytes

If I checkout the actual fix, I get:

❯ git checkout 249507a8cf0294b844ad52d0707c83997d931205
HEAD is now at 249507a zap.Any: Reduce stack size with generics (#1310)
❯ go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
go.uber.org/zap.Any STEXT size=3712 args=0x20 locals=0xb8 funcid=0x0 align=0x0.   <<<<<<<===== 184 bytes
❯ gotip build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
go.uber.org/zap.Any STEXT size=3712 args=0x20 locals=0xb8 funcid=0x0 align=0x0    <<<<<<<====== 184 bytes

So the difference is no longer as huge (4840 to 184 vs 294 vs 184)

I don't currently understand what the implication of large size is and whether it affects anything in practice (does it?)

@thanm
Copy link
Contributor

thanm commented Apr 12, 2024

I don't currently understand what the implication of large size is and whether it affects anything in practice (does it?)

Not sure I understand your question. Could you expand please?

@rabbbit
Copy link

rabbbit commented Apr 12, 2024

I was unclear on the impact of the size field below.

go.uber.org/zap.Any STEXT size=11808 args=0x20 locals=0x128

vs

go.uber.org/zap.Any STEXT size=3712 args=0x20 locals=0xb8

But I was since updated that size is just a function size and shouldn't have impact on anything. And that locals is what we care about here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

6 participants