Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fpTracebackPartialExpand SIGSEGV on amd64 with deep inlining #69629

Closed
lizthegrey opened this issue Sep 26, 2024 · 31 comments
Closed

runtime: fpTracebackPartialExpand SIGSEGV on amd64 with deep inlining #69629

lizthegrey opened this issue Sep 26, 2024 · 31 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@lizthegrey
Copy link

Go version

go version go1.23.1 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/lizf/.cache/go-build'
GOENV='/home/lizf/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/lizf/hny/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/lizf/hny/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/snap/go/current'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/snap/go/current/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/lizf/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/lizf/hny/go/src/github.com/honeycombio/hound/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1515630055=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Compiled using PGO, which inlined this (otherwise not normally inlinable) method:

func (r *Readahead) Read(p []byte) (n int, err error) {
	for r.out != nil {
		got, _ := r.out.Read(p[n:])
		n += got
		if n == len(p) {
			return n, nil
		}

		// recycle our block
		r.returnBuffer <- r.out

		// get another block
		r.out = <-r.getNextOutput // Crash happens here, when reporting blocking profiles.
	}

	err = <-r.err
	if err == nil {
		err = io.EOF
	}
	return n, err
}

into this method:

// Returns the requested number of bytes from the internal buffer, without
// copying. Results become invalid next time you do anything.
// Best used for small values.
func (r *Readahead) Next(length int) (buf []byte, err error) {
	if r.out == nil {
		return nil, io.EOF
	}
	got := r.out.Next(length)

	// Common case - we got everything we need
	if len(got) == length {
		return got, nil
	}

	// Uncommon case - partial read across a block boundary
	if cap(r.nextBuf) < length {
		r.nextBuf = make([]byte, 0, length)
	}
	r.nextBuf = append(r.nextBuf[:0], got...)

	n, err := r.Read(r.nextBuf[len(r.nextBuf):length]) // This function inlined.
	return r.nextBuf[:len(r.nextBuf)+n], err
}

and for completeness, the struct:

type Readahead struct {
	frame *frame
	out   *bytes.Buffer

	getNextOutput chan *bytes.Buffer
	returnBuffer  chan *bytes.Buffer
	err           chan error

	// scratch space for Next results which span block boundaries.
	nextBuf []byte
}

and also our runtime profiling config:

	runtime.SetBlockProfileRate(1000000)
	runtime.SetMutexProfileFraction(1000)

What did you see happen?

Rarely, when a block longer than a second happens reading from the channel:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30bc5b31 pc=0x43b1da]
goroutine 1093 gp=0xc004afe700 m=5 mp=0xc0006e8708 [running]:
runtime/mprof.go:592
runtime.chanrecv(0xc00068cd90, 0xc000712788, 0x1)
runtime/chan.go:651 +0x72f fp=0xc000712720 sp=0xc000712620 pc=0x40abaf
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read(0xc00079c780, {0xc00836caa7, 0x1, 0x1})
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next(...)
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:706 +0x251 fp=0xc000712818 sp=0xc0007127a0 pc=0x1bad251
github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read(...)
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5(_, {{_, _}, _, {_, _}}, _, {0x0, 0x0, 0x0}, ...)
runtime.fpTracebackPartialExpand(...)
runtime/mprof.go:563 +0x35a fp=0xc000712620 sp=0xc000712578 pc=0x43b1da
runtime.chanrecv1(0xc000ba1840?, 0x2?)
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:110 +0x16f fp=0xc0007127a0 sp=0xc000712748 pc=0x1b7322f
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:155
github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[...]).readRecord(0x1d2f640?, 0x22691a8?)
runtime.throw({0x217ada7?, 0xc000712568?})
runtime/panic.go:1067 +0x48 fp=0xc000712518 sp=0xc0007124e8 pc=0x48af68
runtime.sigpanic()
runtime/signal_unix.go:884 +0x3c9 fp=0xc000712578 sp=0xc000712518 pc=0x48dbc9
runtime.saveblockevent(0x3fbfec, 0x262594, 0x3, 0x2)
runtime.blockevent(...)
runtime/mprof.go:513
runtime/chan.go:489 +0x12 fp=0xc000712748 sp=0xc000712720 pc=0x40a452
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:433
net.(*netFD).Read(...)
crypto/tls.(*atLeastReader).Read(0xc007ddc168, {0xc000eea000?, 0xc000cf83c0?, 0xc000eee431?})
bytes.(*Buffer).ReadFrom(0xc000865eb8, {0x257fa60, 0xc007ddc168})
crypto/tls/conn.go:831
crypto/tls/conn.go:629 +0x49e fp=0xc0029b6b80 sp=0xc0029b67e8 pc=0x6b9e1e
crypto/tls.(*Conn).Read(0xc000865c08, {0xc000c41fc9, 0x3a42, 0x2b5c837?})
crypto/tls/conn.go:1385 +0x150 fp=0xc0029b6bf0 sp=0xc0029b6b80 pc=0x6c33d0
net/http.(*persistConn).Read(0xc0005d8c60, {0xc000c41fc9?, 0x4b4559?, 0x4b4559?})
io.(*LimitedReader).Read(0xc008c0eaf8, {0xc000c41fc9?, 0xc00014c4b0?, 0xc0001c9530?})
github.com/sourcegraph/conc/pool.(*Pool).Wait(0xc008ac8200)
github.com/honeycombio/hound/lib/retriever/retriever.go:1623 +0x1f33 fp=0xc000539528 sp=0xc000538bd8 pc=0x1c20a13
github.com/honeycombio/hound/cmd/retriever-lambda/app.(*App).Invoke(0xc0001af7c0, {0x25a0998, 0xc000869170}, {0xc004b70000, 0x58780, 0x80000})
github.com/aws/aws-lambda-go@v1.47.0/lambda/handler.go:298 +0x43 fp=0xc0005398d0 sp=0xc000539878 pc=0x838123
github.com/aws/aws-lambda-go@v1.47.0/lambda/invoke_loop.go:75 +0x2b2 fp=0xc000539ab8 sp=0xc000539940 pc=0x8386f2
github.com/aws/aws-lambda-go/lambda.startRuntimeAPILoop({0xc000058107, 0xe}, {0x257e140, 0xc0001d4d90})

the problem appears to be at https://github.com/golang/go/blob/go1.23.1/src/runtime/mprof.go#L592 when unwinding the inlined calls on the stack.

We cannot repro this behaviour on arm64, it only repros on amd64, and we do not believe it would repro on go1.22 based on what we are seeing in the changed code from 1.22 to 1.23 with https://go-review.googlesource.com/c/go/+/533258 and 87abb4a both being new additions, but we've already upgraded to 1.23 semantics in our full codebase and don't want to attempt to undo them all.

What did you expect to see?

No crash.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 26, 2024
@lizthegrey lizthegrey changed the title runtime: fpTracebackPartialExpand SIGSEGV on x86 with deep inlining runtime: fpTracebackPartialExpand SIGSEGV on amd64 with deep inlining Sep 26, 2024
@lizthegrey
Copy link
Author

What's odd is this ought to repro it if my theory is right, but it does not:

https://go.dev/play/p/SGNoe38oMwP

go run -gcflags=-m /tmp/foo.go
# command-line-arguments
/tmp/foo.go:8:6: can inline inline
/tmp/foo.go:15:5: can inline main.func1
/tmp/foo.go:19:8: inlining call to inline
/tmp/foo.go:8:13: x does not escape
/tmp/foo.go:15:5: func literal escapes to heap

@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@lizthegrey
Copy link
Author

lizthegrey commented Sep 26, 2024

here's the diff of the generated assembly when inlining of that specific function is enabled/disabled via -gcflags=./lib/retriever/cstorage/lz4=-d=pgohash=-000001101101101100001110/:
disasm_diff.txt

and here's the inlining decision:

hot-node enabled increased budget=2000 for func=github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read
hot-node enabled increased budget=2000 for func=github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next
lib/retriever/cstorage/lz4/lz4.go:155:18 (inline) [bisect-match 0xb9c41e339406db0e]
pgohash triggered lib/retriever/cstorage/lz4/lz4.go:155:18 (inline) 000001101101101100001110
hot-budget check allows inlining for call github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read (cost 119) at lib/retriever/cstorage/lz4/lz4.go:155:18 in function github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next
[...]
lib/retriever/cstorage/lz4/lz4.go:155:18 (inline) [bisect-match 0xb9c41e339406db0e]
pgohash triggered lib/retriever/cstorage/lz4/lz4.go:155:18 (inline) 000001101101101100001110
hot-budget check allows inlining for call github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read (cost 119) at lib/retriever/cstorage/lz4/lz4.go:155:18 in function github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next

There's another layer of inlining above though:

lib/retriever/cstorage/compact_column.go:706:33 (inline) [bisect-match 0xe2bf22770bf1bc7b]
pgohash triggered lib/retriever/cstorage/compact_column.go:706:33 (inline) 111100011011110001111011
hot-budget check allows inlining for call github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next (cost 239) at lib/retriever/cstorage/compact_column.go:706:33 in function github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[go.shape.i
nt64]).readRecord

and another layer on top of that:

hot-node enabled increased budget=2000 for func=github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read
lib/retriever/cstorage/compact_column.go:433:51 (inline) [bisect-match 0xaec5fdc13d20c1f1]
pgohash triggered lib/retriever/cstorage/compact_column.go:433:51 (inline) 001000001100000111110001
hot-budget check allows inlining for call github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[go.shape.int64]).readRecord (cost 478) at lib/retriever/cstorage/compact_column.go:433:51 in function github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read

and another layer on top of that:

lib/retriever/cstorage/column_manager.go:2101:37 (inline) [bisect-match 0xdadffcd45cb2c504]
pgohash triggered lib/retriever/cstorage/column_manager.go:2101:37 (inline) 101100101100010100000100
hot-budget check allows inlining for call github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read (cost 509) at lib/retriever/cstorage/column_manager.go:2101:37 in function github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5

@prattmic
Copy link
Member

Thanks for narrowing down with the pgohash! Just to be clear, the crash still reproduces with the limited inlining via that pgohash flag, right?

cc @felixge @nsrip-dd for FP unwinding

@lizthegrey
Copy link
Author

lizthegrey commented Sep 26, 2024

Thanks for narrowing down with the pgohash! Just to be clear, the crash still reproduces with the limited inlining via that pgohash flag, right?

cc @felixge @nsrip-dd for FP unwinding

I'll get an answer to you tomorrow regarding running the pgohash=-000001101101101100001110 build, for obvious reasons I don't like to push potentially crashing code even to pre-production environments when I'm about to go to bed, and this investigation took me most of the day.

Tested https://go.dev/play/p/NMzamFE6hvu really quick and it doesn't crash either. Unfortunate!

Unfortunately, if inlining in further upstream calls is implicated, I'm not able to give you the source to anything above the /lz4 node in the package tree.

@lizthegrey
Copy link
Author

Disabling that individual inlining decision within the package causes it to now start crashing with a different traceback, still from the same inlined location, because I need to play whackamole with pgohash across each place it could possibly be inlined. I'm going to use //go:noinline instead to try to prove it.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30bc5b31 pc=0x43b1da]
runtime.throw({0x217c7a7?, 0xc000b3c568?})
runtime/panic.go:1067 +0x48 fp=0xc000b3c518 sp=0xc000b3c4e8 pc=0x48af68
runtime.sigpanic()
runtime.saveblockevent(0x61304, 0x2625cc, 0x3, 0x2)
runtime/mprof.go:563 +0x35a fp=0xc000b3c620 sp=0xc000b3c578 pc=0x43b1da
runtime.blockevent(...)
runtime/chan.go:651 +0x72f fp=0xc000b3c720 sp=0xc000b3c620 pc=0x40abaf
runtime.chanrecv1(0x2?, 0x0?)
runtime/chan.go:489 +0x12 fp=0xc000b3c748 sp=0xc000b3c720 pc=0x40a452
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next(...)
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2101 +0x1e90 fp=0xc000b3d258 sp=0xc000b3c818 pc=0x1b8c1d0
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2()
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x96
runtime.gopark(...)
runtime/proc.go:424
runtime/signal_unix.go:884 +0x3c9 fp=0xc000b3c578 sp=0xc000b3c518 pc=0x48dbc9
runtime.fpTracebackPartialExpand(...)
runtime/mprof.go:513
runtime.chanrecv(0xc0005de460, 0xc000b3c788, 0x1)
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:110 +0x16f fp=0xc000b3c7a0 sp=0xc000b3c748 pc=0x1b7322f
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:706 +0x251 fp=0xc000b3c818 sp=0xc000b3c7a0 pc=0x1bad091
github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read(...)
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:433
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5(_, {{_, _}, _, {_, _}}, _, {0x0, 0x0, 0x0}, ...)
golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 +0x50 fp=0xc000b3dfe0 sp=0xc000b3df78 pc=0xc45150
runtime.goexit({})
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 53
goroutine 1 gp=0xc0000061c0 m=nil [semacquire]:
goroutine 29 gp=0xc00065e000 m=5 mp=0xc00030e708 [running]:
runtime/mprof.go:592
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read(0xc000199380, {0xc012555460, 0x8, 0x8})
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:155
github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[...]).readRecord(0x0?, 0x226ac10?)
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2271 +0x1c85 fp=0xc000b3df78 sp=0xc000b3d258 pc=0x1b89845
runtime/asm_amd64.s:1700 +0x1 fp=0xc000b3dfe8 sp=0xc000b3dfe0 pc=0x493e01
runtime.gopark(...)
runtime.gcenable.gowrap1()
runtime/mgc.go:203 +0x66
runtime/proc.go:424
runtime.goparkunlock(0x25747b8?, 0x0?, 0x0?, 0x0?)
runtime.goparkunlock(0x30d8df1604e?, 0x68?, 0x19?, 0xc0000c0780?)
runtime/proc.go:430 +0xcf fp=0xc0015bcae0 sp=0xc0015bcac0 pc=0x449acf
runtime/sema.go:178 +0x219 fp=0xc0015bcb48 sp=0xc0015bcae0 pc=0x4609d9
github.com/sourcegraph/conc.(*WaitGroup).Wait(0xc000665280)
github.com/sourcegraph/conc@v0.3.0/pool/error_pool.go:37
github.com/sourcegraph/conc/pool.(*ContextPool).Wait(0xc000665280)
github.com/honeycombio/hound/cmd/retriever-lambda/app.(*App).Invoke(0xc0007b46c0, {0x25a24f8, 0xc0007ae180}, {0xc000d80000, 0x21d592, 0x400000})
github.com/aws/aws-lambda-go@v1.47.0/lambda/entry.go:106 +0xdc fp=0xc0015bdc10 sp=0xc0015bdb60 pc=0x8351dc
github.com/aws/aws-lambda-go/lambda.StartWithOptions({0x1e53740?, 0xc0007b46c0?}, {0xc000113c88?, 0x0?, 0x0?})
github.com/honeycombio/hound/cmd/retriever-lambda/main.go:143 +0xe53 fp=0xc0015bdf50 sp=0xc0015bdca0 pc=0x1c5d613
runtime.goparkunlock(0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424
runtime.goparkunlock(0x3781c01?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:430 +0xcf fp=0xc000080f58 sp=0xc000080f38 pc=0x449acf
runtime/mgc.go:203 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x41e625
created by runtime.gcenable in goroutine 1
runtime/proc.go:430 +0xcf fp=0xc000081780 sp=0xc000081760 pc=0x449acf
runtime.(*scavengerState).park(0x3773640)
runtime.semacquire1(0xc000665288, 0x0, 0x1, 0x0, 0x12)
sync.(*WaitGroup).Wait(0x10e4ec0?)
github.com/sourcegraph/conc/pool.(*Pool).Wait(0xc000665280)
github.com/sourcegraph/conc@v0.3.0/pool/pool.go:79 +0x30 fp=0xc0015bcbe0 sp=0xc0015bcbc8 pc=0x10e4e30
github.com/sourcegraph/conc@v0.3.0/pool/context_pool.go:57 +0x38 fp=0xc0015bcc18 sp=0xc0015bcbe0 pc=0x10e4a78
github.com/aws/aws-lambda-go/lambda.reflectHandler.func1({0x25a24f8?, 0xc0007ae180?}, {0xc000d80000?, 0x30?, 0x1fa8b60?})
github.com/aws/aws-lambda-go@v1.47.0/lambda/invoke_loop.go:75 +0x2b2 fp=0xc0015bdab8 sp=0xc0015bd940 pc=0x8386f2
github.com/aws/aws-lambda-go/lambda.StartHandlerWithContext({0x25a24f8?, 0xc0007110e0?}, {0x257f8a0?, 0xc0007b46c0?})
runtime.main()
runtime/proc.go:272 +0x28b fp=0xc0015bdfe0 sp=0xc0015bdf50 pc=0x44960b
runtime.goexit({})

@lizthegrey
Copy link
Author

setting //go:noinline helped eliminate the Readahead.Next call as a proximate culprit, but now there's a new culprit:

fatal error: unexpected signal during runtime execution
runtime/chan.go:651 +0x72f fp=0xc0009fa720 sp=0xc0009fa620 pc=0x40abaf
golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x96
github.com/sourcegraph/conc@v0.3.0/pool/pool.go:79 +0x30 fp=0xc000112be0 sp=0xc000112bc8 pc=0x10e4e30
runtime.blockevent(...)
github.com/honeycombio/hound/lib/retriever.(*ReadOnlyRetriever).FetchPartialFromSegments(0xc000542690, {0x25a24f8?, 0xc000790cf0?}, 0xc000784500)
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read(0xc0007ec7c0, {0xc01017d50a, 0x6, 0x6})
github.com/sourcegraph/conc/pool.(*ErrorPool).Wait(...)
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5(_, {{_, _}, _, {_, _}}, _, {0x0, 0x0, 0x0}, ...)
github.com/sourcegraph/conc/pool.(*ContextPool).Wait(0xc000693180)
runtime.semacquire1(0xc000693188, 0x0, 0x1, 0x0, 0x12)
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2271 +0x1c85 fp=0xc0009fbf78 sp=0xc0009fb258 pc=0x1b89845
runtime/proc.go:430 +0xcf fp=0xc000112ae0 sp=0xc000112ac0 pc=0x449acf
goroutine 26 gp=0xc000316e00 m=0 mp=0x3775ea0 [running]:
runtime/mprof.go:513
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2101 +0x1e90 fp=0xc0009fb258 sp=0xc0009fa818 pc=0x1b8c1d0
runtime/proc.go:424
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:706 +0x251 fp=0xc0009fa818 sp=0xc0009fa7a0 pc=0x1bad091
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 +0x50 fp=0xc0009fbfe0 sp=0xc0009fbf78 pc=0xc45150
github.com/honeycombio/hound/cmd/retriever-lambda/app.(*App).Invoke(0xc000583300, {0x25a24f8, 0xc000790120}, {0xc000db8000, 0x20ea7e, 0x400000})
github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read(...)
github.com/sourcegraph/conc@v0.3.0/waitgroup.go:39 +0x1a fp=0xc000112bc8 sp=0xc000112ba8 pc=0x10d019a
runtime.throw({0x217c7a7?, 0xc0009fa568?})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0009fbfe8 sp=0xc0009fbfe0 pc=0x493e01
runtime.fpTracebackPartialExpand(...)
goroutine 1 gp=0xc0000061c0 m=nil [semacquire]:
runtime.goexit({})
github.com/honeycombio/hound/lib/retriever/retriever.go:1623 +0x1f33 fp=0xc000113528 sp=0xc000112c18 pc=0x1c217f3
github.com/sourcegraph/conc/pool.(*Pool).Wait(0xc000693180)
runtime/mprof.go:592
runtime.chanrecv(0xc0005d1500, 0xc0009fa788, 0x1)
runtime/mprof.go:563 +0x35a fp=0xc0009fa620 sp=0xc0009fa578 pc=0x43b1da
sync.runtime_Semacquire(0x0?)
sync/waitgroup.go:118 +0x48 fp=0xc000112ba8 sp=0xc000112b80 pc=0x4a3c68
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:433
github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[...]).readRecord(0x0?, 0x226ac10?)
github.com/sourcegraph/conc.(*WaitGroup).Wait(0xc000693180)
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2()
github.com/aws/aws-lambda-go/lambda.reflectHandler.func1({0x25a24f8?, 0xc000790120?}, {0xc000db8000?, 0x30?, 0x1fa8b60?})
runtime/sema.go:71 +0x25 fp=0xc000112b80 sp=0xc000112b48 pc=0x48d0c5
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:111 +0x16f fp=0xc0009fa7a0 sp=0xc0009fa748 pc=0x1b7322f
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 46
github.com/sourcegraph/conc@v0.3.0/pool/error_pool.go:37
runtime.saveblockevent(0x36e30, 0x262594, 0x3, 0x2)
runtime/signal_unix.go:884 +0x3c9 fp=0xc0009fa578 sp=0xc0009fa518 pc=0x48dbc9
runtime/panic.go:1067 +0x48 fp=0xc0009fa518 sp=0xc0009fa4e8 pc=0x48af68
runtime/chan.go:489 +0x12 fp=0xc0009fa748 sp=0xc0009fa720 pc=0x40a452
github.com/aws/aws-lambda-go@v1.47.0/lambda/handler.go:298 +0x43 fp=0xc0001138d0 sp=0xc000113878 pc=0x838123
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next(...)
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:156
sync.(*WaitGroup).Wait(0x10e4ec0?)
runtime.gopark(...)
github.com/sourcegraph/conc@v0.3.0/pool/context_pool.go:57 +0x38 fp=0xc000112c18 sp=0xc000112be0 pc=0x10e4a78
runtime.chanrecv1(0xffff?, 0x0?)
runtime/sema.go:178 +0x219 fp=0xc000112b48 sp=0xc000112ae0 pc=0x4609d9
github.com/honeycombio/hound/cmd/retriever-lambda/app/app.go:118 +0xc48 fp=0xc000113878 sp=0xc000113528 pc=0x1c2e008
runtime.sigpanic()
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30bc5b31 pc=0x43b1da]
runtime.goparkunlock(0x1360eb98166?, 0x40?, 0x3a?, 0xc000886720?)
runtime/mfinal.go:193 +0x107 fp=0xc000084fe0 sp=0xc000084e20 pc=0x41d4a7
runtime/mgc.go:1363 +0xe9 fp=0xc000805fc8 sp=0xc000805f38 pc=0x4210a9
runtime/mgc.go:204 +0xa5
github.com/aws/aws-lambda-go@v1.47.0/lambda/entry.go:106 +0xdc fp=0xc000113c10 sp=0xc000113b60 pc=0x8351dc
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x493e01
runtime/proc.go:424 +0xce fp=0xc000805f38 sp=0xc000805f18 pc=0x48b08e
runtime/mfinal.go:163 +0x3d
runtime/mgcscavenge.go:658 +0x59 fp=0xc0000817c8 sp=0xc0000817a8 pc=0x429ef9
runtime.goexit({})
main.main()
github.com/aws/aws-lambda-go/lambda.StartWithOptions({0x1e53740?, 0xc000583300?}, {0xc00071dc88?, 0x0?, 0x0?})

@prattmic
Copy link
Member

Thanks. It would also help if you could disassemble the function containing the crash PC (pc=0x40abaf in #69629 (comment), which is presumably in saveblockevent). It isn't clear to me exactly which access is crashing. The dereference of the FP would be the obvious thing to crash, but that doesn't seem to match up with the printed line number.

@lizthegrey
Copy link
Author

0x40abaf

TEXT runtime.chanrecv(SB) runtime/chan.go
[...]                                             
  chan.go:642           0x40ab00                48398a40010000                  CMPQ 0x140(DX), CX                                      
  chan.go:642           0x40ab07                0f853e040000                    JNE 0x40af4b                                            
  chan.go:645           0x40ab0d                488b842400010000                MOVQ 0x100(SP), AX                                      
  chan.go:645           0x40ab15                4883782000                      CMPQ 0x20(AX), $0x0                                     
  chan.go:645           0x40ab1a                7419                            JE 0x40ab35                                             
  chan.go:646           0x40ab1c                0f1f4000                        NOPL 0(AX)                                              
  chan.go:646           0x40ab20                e8fb540600                      CALL runtime.unblockTimerChan(SB)                       
  chan.go:650           0x40ab25                488b8c2498000000                MOVQ 0x98(SP), CX                                       
  chan.go:648           0x40ab2d                488b9424b8000000                MOVQ 0xb8(SP), DX                                       
  chan.go:648           0x40ab35                833df4e6380300                  CMPL runtime.writeBarrier(SB), $0x0
  chan.go:648           0x40ab3c                740f                            JE 0x40ab4d                                             
  chan.go:648           0x40ab3e                4c8b8240010000                  MOVQ 0x140(DX), R8                                      
  chan.go:648           0x40ab45                e8f6920800                      CALL runtime.gcWriteBarrier1(SB)                        
  chan.go:648           0x40ab4a                4d8903                          MOVQ R8, 0(R11)                                         
  chan.go:648           0x40ab4d                48c7824001000000000000          MOVQ $0x0, 0x140(DX)                                    
  chan.go:649           0x40ab58                c682c000000000                  MOVB $0x0, 0xc0(DX)                                     
  chan.go:650           0x40ab5f                488b4128                        MOVQ 0x28(CX), AX                                       
  chan.go:650           0x40ab63                4885c0                          TESTQ AX, AX                                            
  chan.go:650           0x40ab66                7e57                            JLE 0x40abbf                                            
  chan.go:651           0x40ab68                488b4c2448                      MOVQ 0x48(SP), CX                                       
  chan.go:651           0x40ab6d                4829c8                          SUBQ CX, AX                                             
  mprof.go:511          0x40ab70                488b1db9dd3803                  MOVQ runtime.blockprofilerate(SB), BX                   
  mprof.go:511          0x40ab77                48895c2458                      MOVQ BX, 0x58(SP)                                       
  mprof.go:507          0x40ab7c                4885c0                          TESTQ AX, AX                                            
  mprof.go:512          0x40ab7f                b901000000                      MOVL $0x1, CX                                           
  mprof.go:512          0x40ab84                480f4ec1                        CMOVLE CX, AX                                           
  mprof.go:512          0x40ab88                4889442450                      MOVQ AX, 0x50(SP)                                       
  mprof.go:512          0x40ab8d                e80e020300                      CALL runtime.blocksampled(SB)                           
  mprof.go:512          0x40ab92                84c0                            TESTL AL, AL                                            
  mprof.go:507          0x40ab94                7419                            JE 0x40abaf                                             
  mprof.go:513          0x40ab96                488b442450                      MOVQ 0x50(SP), AX                                       
  mprof.go:513          0x40ab9b                488b5c2458                      MOVQ 0x58(SP), BX                                       
  mprof.go:513          0x40aba0                b903000000                      MOVL $0x3, CX                                           
  mprof.go:513          0x40aba5                bf02000000                      MOVL $0x2, DI                                           
  mprof.go:513          0x40abaa                e8d1020300                      CALL runtime.saveblockevent(SB)                         
  chan.go:653           0x40abaf                488b8c2498000000                MOVQ 0x98(SP), CX                                       
  chan.go:654           0x40abb7                488b9424b8000000                MOVQ 0xb8(SP), DX                                       
  chan.go:653           0x40abbf                0fb65935                        MOVZX 0x35(CX), BX                                      
  chan.go:654           0x40abc3                833d66e6380300                  CMPL runtime.writeBarrier(SB), $0x0                     
  chan.go:654           0x40abca                7417                            JE 0x40abe3                                             
  chan.go:654           0x40abcc                4c8b8290000000                  MOVQ 0x90(DX), R8                                       
  chan.go:654           0x40abd3                e888920800                      CALL runtime.gcWriteBarrier2(SB)                        
  chan.go:654           0x40abd8                4d8903                          MOVQ R8, 0(R11)                                         
  chan.go:655           0x40abdb                4c8b4150                        MOVQ 0x50(CX), R8                                       
  chan.go:655           0x40abdf                4d894308                        MOVQ R8, 0x8(R11)                                       
  chan.go:654           0x40abe3                48c7829000000000000000          MOVQ $0x0, 0x90(DX)                                     
  chan.go:655           0x40abee                48c7415000000000                MOVQ $0x0, 0x50(CX)                                     
  chan.go:656           0x40abf6                90                              NOPL

@prattmic
Copy link
Member

  mprof.go:513          0x40abaa                e8d1020300                      CALL runtime.saveblockevent(SB)                         
  chan.go:653           0x40abaf                488b8c2498000000                MOVQ 0x98(SP), CX     

Oof, SP is completely bogus? The reported sp=0xc0009fa620 doesn't seem completely bogus, but it does differ quite a bit from SP at higher frames (e.g., sp=0xc000112ac0), which it shouldn't unless there are giant stack objects.

Also, 0xc0009fa620+0x98 doesn't cross a page boundary, so presumably this is first use of SP after the corruption.

By the way, do you know what is up with the formatting of your stack traces? Some frames are missing filename/line, and some of the frames don't make much sense:

golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x96
github.com/sourcegraph/conc@v0.3.0/pool/pool.go:79 +0x30 fp=0xc000112be0 sp=0xc000112bc8 pc=0x10e4e30
runtime.blockevent(...)
github.com/honeycombio/hound/lib/retriever.(*ReadOnlyRetriever).FetchPartialFromSegments(0xc000542690, {0x25a24f8?, 0xc000790cf0?}, 0xc000784500)
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read(0xc0007ec7c0, {0xc01017d50a, 0x6, 0x6})

How did runtime.blockevent end up in the middle of these other frames? Maybe this is a symptom of the stack corruption?

@lizthegrey
Copy link
Author

lizthegrey commented Sep 26, 2024

By the way, do you know what is up with the formatting of your stack traces? Some frames are missing filename/line, and some of the frames don't make much sense:

Unfortunately, this is being mangled by our lambda stderr/stdout lambda log agent because it's executing within AWS lambda. It is also unfortunately theoretically possible that there are multiple crashes in close proximity that are resulting in interleaving of multiple dumps, although I don't think that's the case in any of the dumps I sent you, because I only see one fatal error: unexpected signal during runtime execution in the time range of the logs I'm looking at?

golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x96
github.com/sourcegraph/conc@v0.3.0/pool/pool.go:79 +0x30 fp=0xc000112be0 sp=0xc000112bc8 pc=0x10e4e30
runtime.blockevent(...)
github.com/honeycombio/hound/lib/retriever.(*ReadOnlyRetriever).FetchPartialFromSegments(0xc000542690, {0x25a24f8?, 0xc000790cf0?}, 0xc000784500)
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read(0xc0007ec7c0, {0xc01017d50a, 0x6, 0x6})

How did runtime.blockevent end up in the middle of these other frames? Maybe this is a symptom of the stack corruption?

Quite possibly. Or see above: it's a result of out of order mangling (sorry, that shouldn't happen in theory but -- aha, yup, I'm sorry, the order of log lines within the same millisecond may be random (since Lambda's API clips log timestamps to ms resolution), URRRRGH I AM SO SORRY, if this were a k8s crash we'd have it in exact strict order)

@lizthegrey
Copy link
Author

For extra measure, 0x43b1da which I think is the actual crash address you wanted:

TEXT runtime.saveblockevent(SB) runtime/mprof.go
[...]
  lockrank_off.go:35    0x43b15b                488d05bed73503          LEAQ runtime.profBlockLock(SB), AX              
  lockrank_off.go:35    0x43b162                e8595bfdff              CALL runtime.unlock2(SB)                        
  runtime1.go:612       0x43b167                488b4c2458              MOVQ 0x58(SP), CX                               
  runtime1.go:612       0x43b16c                8b9108010000            MOVL 0x108(CX), DX                              
  runtime1.go:612       0x43b172                8d5aff                  LEAL -0x1(DX), BX                               
  mprof.go:571          0x43b175                90                      NOPL                                            
  runtime1.go:612       0x43b176                899908010000            MOVL BX, 0x108(CX)                              
  runtime1.go:612       0x43b17c                0f1f4000                NOPL 0(AX)                                      
  runtime1.go:613       0x43b180                83fa01                  CMPL DX, $0x1                                   
  runtime1.go:613       0x43b183                7512                    JNE 0x43b197                                    
  runtime1.go:613       0x43b185                4180beb900000000        CMPB 0xb9(R14), $0x0                            
  runtime1.go:613       0x43b18d                7408                    JE 0x43b197                                     
  runtime1.go:615       0x43b18f                49c74610defaffff        MOVQ $-0x522, 0x10(R14)                         
  mprof.go:572          0x43b197                4881c498000000          ADDQ $0x98, SP                                  
  mprof.go:572          0x43b19e                5d                      POPQ BP                                         
  mprof.go:572          0x43b19f                90                      NOPL                                            
  mprof.go:572          0x43b1a0                c3                      RET                                             
  mprof.go:570          0x43b1a1                4889c1                  MOVQ AX, CX                                     
  mprof.go:570          0x43b1a4                e8d7900500              CALL runtime.panicSliceAcap(SB)                 
  mprof.go:570          0x43b1a9                660f1f840000000000      NOPW 0(AX)(AX*1)                                
  mprof.go:570          0x43b1b2                660f1f840000000000      NOPW 0(AX)(AX*1)                                
  mprof.go:570          0x43b1bb                0f1f440000              NOPL 0(AX)(AX*1)                                
  mprof.go:615          0x43b1c0                488b00                  MOVQ 0(AX), AX                                  
  mprof.go:590          0x43b1c3                4c8b4c2448              MOVQ 0x48(SP), R9                               
  mprof.go:590          0x43b1c8                4939f1                  CMPQ R9, SI                                     
  mprof.go:590          0x43b1cb                0f8d87000000            JGE 0x43b258                                    
  mprof.go:590          0x43b1d1                4885c0                  TESTQ AX, AX                                    
  mprof.go:590          0x43b1d4                0f847e000000            JE 0x43b258                                     
  mprof.go:592          0x43b1da                4c8b5008                MOVQ 0x8(AX), R10                               
  mprof.go:594          0x43b1de                48837c243800            CMPQ 0x38(SP), $0x0                             
  mprof.go:594          0x43b1e4                7f14                    JG 0x43b1fa                                     
  mprof.go:590          0x43b1e6                4939f1                  CMPQ R9, SI                                     
  mprof.go:610          0x43b1e9                0f8388010000            JAE 0x43b377                                    
  mprof.go:610          0x43b1ef                4e8914cf                MOVQ R10, 0(DI)(R9*8)                           
  mprof.go:611          0x43b1f3                48ff442448              INCQ 0x48(SP)                                   
  mprof.go:611          0x43b1f8                ebc6                    JMP 0x43b1c0                                    
  mprof.go:600          0x43b1fa                884c2437                MOVB CL, 0x37(SP)                               
  mprof.go:590          0x43b1fe                4889842488000000        MOVQ AX, 0x88(SP)                               
  mprof.go:595          0x43b206                498d42ff                LEAQ -0x1(R10), AX                              
  mprof.go:595          0x43b20a                4889442450              MOVQ AX, 0x50(SP)                               
  mprof.go:596          0x43b20f                e8ac450500              CALL runtime.findfunc(SB)                       
  mprof.go:597          0x43b214                488b4c2450              MOVQ 0x50(SP), CX                               
  mprof.go:597          0x43b219                e8824a0500              CALL runtime.newInlineUnwinder(SB)              
  mprof.go:597          0x43b21e                4889442460              MOVQ AX, 0x60(SP)                               
  mprof.go:597          0x43b223                48895c2468              MOVQ BX, 0x68(SP)                               
  mprof.go:597          0x43b228                48894c2470              MOVQ CX, 0x70(SP)                               
  mprof.go:598          0x43b22d                488b4c2478              MOVQ 0x78(SP), CX                               
  mprof.go:598          0x43b232                488b542440              MOVQ 0x40(SP), DX                               
  mprof.go:598          0x43b237                0fb65c2437              MOVZX 0x37(SP), BX                              
  mprof.go:598          0x43b23c                eb70                    JMP 0x43b2ae

@prattmic
Copy link
Member

Ah, that's much better. That is the dereference of the FP.

@lizthegrey
Copy link
Author

I spent half an hour sorting from the randomly ordered output the stacks by fp and sp on my latest build because of that damn lack of microsecond precision in lambda generated log entries, you're welcome ;)

I have now flagged off using pgohash what I believe to be all of the relevant inlining, and it's now broken at a different spot, which I think means I have to add more to my pgohash and keep playing whackamole.

fatal error: unexpected signal during runtime execution

[signal SIGSEGV: segmentation violation code=0x1 addr=0x30bc5b31 pc=0x43b1da]

goroutine 56 gp=0xc00050ba40 m=5 mp=0xc0003f6708 [running]:

runtime/panic.go:1067 +0x48 fp=0xc000a1c488 sp=0xc000a1c458 pc=0x48af68
runtime/signal_unix.go:884 +0x3c9 fp=0xc000a1c4e8 sp=0xc000a1c488 pc=0x48dbc9
runtime/mprof.go:563 +0x35a fp=0xc000a1c590 sp=0xc000a1c4e8 pc=0x43b1da
runtime/chan.go:651 +0x72f fp=0xc000a1c690 sp=0xc000a1c590 pc=0x40abaf
runtime/chan.go:489 +0x12 fp=0xc000a1c6b8 sp=0xc000a1c690 pc=0x40a452
github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:110 +0x16f fp=0xc000a1c710 sp=0xc000a1c6b8 pc=0x1b7336f
github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:684 (inlined)
  // github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[...]).read(...)
github.com/honeycombio/hound/lib/retriever/cstorage/varstring.go:585 +0x105 fp=0xc000a1c770 sp=0xc000a1c710 pc=0x1ba6f05
  // github.com/honeycombio/hound/lib/retriever/cstorage.(*varstringCompactReader).read(0xc000949620)
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:583 +0x43 fp=0xc000a1c7b8 sp=0xc000a1c770 pc=0x1b8fb83
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:734 +0xf4 fp=0xc000a1c818 sp=0xc000a1c7b8 pc=0x1b82474
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2152 +0x34e5 fp=0xc000a1d258 sp=0xc000a1c818 pc=0x1b8d425
github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2271 +0x1c85 fp=0xc000a1df78 sp=0xc000a1d258 pc=0x1b89445
golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 +0x50 fp=0xc000a1dfe0 sp=0xc000a1df78 pc=0xc45150
runtime/asm_amd64.s:1700 +0x1 fp=0xc000a1dfe8 sp=0xc000a1dfe0 pc=0x493e01

goroutine 1 gp=0xc0000061c0 m=nil [semacquire]:

runtime/proc.go:430 +0xcf fp=0xc000112ae0 sp=0xc000112ac0 pc=0x449acf
runtime/sema.go:178 +0x219 fp=0xc000112b48 sp=0xc000112ae0 pc=0x4609d9
runtime/sema.go:71 +0x25 fp=0xc000112b80 sp=0xc000112b48 pc=0x48d0c5
sync/waitgroup.go:118 +0x48 fp=0xc000112ba8 sp=0xc000112b80 pc=0x4a3c68
github.com/sourcegraph/conc@v0.3.0/waitgroup.go:39 +0x1a fp=0xc000112bc8 sp=0xc000112ba8 pc=0x10d019a
github.com/sourcegraph/conc@v0.3.0/pool/pool.go:79 +0x30 fp=0xc000112be0 sp=0xc000112bc8 pc=0x10e4e30
github.com/sourcegraph/conc@v0.3.0/pool/context_pool.go:57 +0x38 fp=0xc000112c18 sp=0xc000112be0 pc=0x10e4a78
github.com/honeycombio/hound/lib/retriever/retriever.go:1623 +0x1f33 fp=0xc000113528 sp=0xc000112c18 pc=0x1c20bb3
github.com/honeycombio/hound/cmd/retriever-lambda/app/app.go:118 +0xc48 fp=0xc000113878 sp=0xc000113528 pc=0x1c2d428
github.com/aws/aws-lambda-go@v1.47.0/lambda/handler.go:298 +0x43 fp=0xc0001138d0 sp=0xc000113878 pc=0x838123
github.com/aws/aws-lambda-go@v1.47.0/lambda/invoke_loop.go:125 +0x6d fp=0xc000113940 sp=0xc0001138d0 pc=0x838f8d
github.com/aws/aws-lambda-go@v1.47.0/lambda/invoke_loop.go:75 +0x2b2 fp=0xc000113ab8 sp=0xc000113940 pc=0x8386f2
github.com/aws/aws-lambda-go@v1.47.0/lambda/invoke_loop.go:39 +0x1d2 fp=0xc000113b60 sp=0xc000113ab8 pc=0x8383d2
github.com/aws/aws-lambda-go@v1.47.0/lambda/entry.go:106 +0xdc fp=0xc000113c10 sp=0xc000113b60 pc=0x8351dc
github.com/aws/aws-lambda-go@v1.47.0/lambda/entry.go:69 +0x2a fp=0xc000113c48 sp=0xc000113c10 pc=0x834faa
github.com/aws/aws-lambda-go@v1.47.0/lambda/entry.go:96 +0x65 fp=0xc000113ca0 sp=0xc000113c48 pc=0x835065
github.com/honeycombio/hound/cmd/retriever-lambda/main.go:143 +0xe53 fp=0xc000113f50 sp=0xc000113ca0 pc=0x1c5c7b3
runtime/proc.go:272 +0x28b fp=0xc000113fe0 sp=0xc000113f50 pc=0x44960b

Relevantly:

pgohash0 triggered github.com/honeycombio/hound/lib/retriever/cstorage/varstring.go:576:54 (inline) 110011011010101101001110
github.com/honeycombio/hound/lib/retriever/cstorage/varstring.go:576:54: inlining call to (*compactFileReader[go.shape.int32]).readRecord
github.com/honeycombio/hound/lib/retriever/cstorage/varstring.go:585:36: inlining call to (*compactFileReader[go.shape.int32]).read
github.com/honeycombio/hound/lib/retriever/cstorage/varstring.go:576:54: inlining call to (*bitmap).nextTrue
github.com/honeycombio/hound/lib/retriever/cstorage/varstring.go:576:54: inlining call to int32FromBytes

@lizthegrey
Copy link
Author

I think this is more in line with what you were expecting to see, I've manually checked everything after unscrambling, I think.

fatal error: unexpected signal during runtime execution

[signal SIGSEGV: segmentation violation code=0x1 addr=0x30bc5b31 pc=0x43b1da]

goroutine 66 gp=0xc001640700 m=4 mp=0xc000100008 [running]:

runtime.throw({0x217884b?, 0xc000994478?})
  runtime/panic.go:1067 +0x48 fp=0xc000994428 sp=0xc0009943f8 pc=0x48af68
runtime.sigpanic()
  runtime/signal_unix.go:884 +0x3c9 fp=0xc000994488 sp=0xc000994428 pc=0x48dbc9
runtime.fpTracebackPartialExpand(...)
  runtime/mprof.go:592
runtime.saveblockevent(0x1d84c, 0x262593, 0x3, 0x2)
  runtime/mprof.go:563 +0x35a fp=0xc000994530 sp=0xc000994488 pc=0x43b1da
runtime.blockevent(...)
  runtime/mprof.go:513
runtime.chanrecv(0xc000677f10, 0xc000994698, 0x1)
  runtime/chan.go:651 +0x72f fp=0xc000994630 sp=0xc000994530 pc=0x40abaf
runtime.chanrecv1(0xc0124ee380?, 0x4?)
  runtime/chan.go:489 +0x12 fp=0xc000994658 sp=0xc000994630 pc=0x40a452
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read(0xc012742c00, {0xc012c73038, 0x8, 0x8})
  github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:110 +0x16f fp=0xc0009946b0 sp=0xc000994658 pc=0x1b7336f
github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next(0xc012742c00, 0x8)
  github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz4.go:155 +0x1c8 fp=0xc000994710 sp=0xc0009946b0 pc=0x1b73748
github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[...]).readRecord(0xc012a02000?, 0x2266d08?)
  github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:706 +0x7f fp=0xc000994740 sp=0xc000994710 pc=0x1ba8fbf
github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read(0xc0128df500?)
  github.com/honeycombio/hound/lib/retriever/cstorage/compact_column.go:433 +0x25 fp=0xc000994768 sp=0xc000994740 pc=0x1b92125
github.com/honeycombio/hound/lib/retriever/cstorage.(*readerWrapper).Initialize.typedToIndex[...].func5()
  github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:585 +0x5c fp=0xc0009947b0 sp=0xc000994768 pc=0x1b8005c
github.com/honeycombio/hound/lib/retriever/cstorage.(*readerWrapper).toIndex(0xc000255008, 0x0?)
  github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:746 // INLINED
github.com/honeycombio/hound/lib/retriever/cstorage.(*readerWrapper).ValueAt(...)
  github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:734 +0xf4 fp=0xc000994810 sp=0xc0009947b0 pc=0x1b80334
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5(_, {{_, _}, _, {_, _}}, _, {0x0, 0x0, 0x0}, ...)
  github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2152 +0x34e5 fp=0xc000995258 sp=0xc000994810 pc=0x1b8b365
github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2()
  github.com/honeycombio/hound/lib/retriever/cstorage/column_manager.go:2271 +0x1c85 fp=0xc000995f78 sp=0xc000995258 pc=0x1b87385
golang.org/x/sync/errgroup.(*Group).Go.func1()
  golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 +0x50 fp=0xc000995fe0 sp=0xc000995f78 pc=0xc45150
runtime.goexit({})
  runtime/asm_amd64.s:1700 +0x1 fp=0xc000995fe8 sp=0xc000995fe0 pc=0x493e01
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 54
  golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x96

goroutine 1 gp=0xc0000061c0 m=nil [semacquire]:

runtime.gopark(...)
  runtime/proc.go:424
runtime.goparkunlock(0x22788d49420?, 0x18?, 0x5b?, 0xc0005938c0?)
  runtime/proc.go:430 +0xcf fp=0xc000112ae0 sp=0xc000112ac0 pc=0x449acf
runtime.semacquire1(0xc000781188, 0x0, 0x1, 0x0, 0x12)

I'll try again to remove the inlining at column_manager.go:746 and see if that causes it to stop crashing.

@lizthegrey
Copy link
Author

No luck with successively trying to remove inlining, it keeps crashing at new places where it trips over inlining again.

It appears any inlining at or upstream of compactFileReader.readRecord() results in the saveblockevent / fpTracebackPartialExpand path being unable to handle traversing the stack upwards. perhaps some kind of interaction with generics or with devirtualisation?

I'll work that theory in the Go playground with a more minimal example, it is painful to have to wait 20 minutes per build cycle and attempt to repro in our staging environment.

@nsrip-dd
Copy link
Contributor

it keeps crashing at new places where it trips over inlining again.

Apologies if I missed it, but how have you concluded that inlining/PGO is where things are going wrong? Do you see crashes in fpTracebackPartialExpand without PGO? (From the first post, seems like the answer is no, but just want to double check)

Also, does the crash output include a register dump? It'd look something like this:

rax 0x5
rbx 0xc003358c40
rcx 0x41
rdx 0x0
rdi 0x73742eebc0e0
rsi 0x10
rbp 0x73742eebc108
rsp 0x73742eebbcb0
r8 0x10
r9 0x1003
r10 0xc0159fd008
r11 0x18
r12 0x73742eebbd30
r13 0xc0159fd008
r14 0xc008299880
r15 0x70
rip 0x64d694
rflags 0x10202
cs 0x33
fs 0x0
gs 0x0

I ask because, based on #69629 (comment), the register R9 in particular will hold the number of frames written to the buffer. That value might give us a hint at how far unwinding made it, and which functions (if any) might have mishandled the frame pointers. But all of the registers would be useful.

@lizthegrey
Copy link
Author

lizthegrey commented Sep 27, 2024

it keeps crashing at new places where it trips over inlining again.

Apologies if I missed it, but how have you concluded that inlining/PGO is where things are going wrong? Do you see crashes in fpTracebackPartialExpand without PGO? (From the first post, seems like the answer is no, but just want to double check)

Correct, disabling PGO causes the bug to not appear. Funnily enough, a different PGO file that does not trigger optimisations in lib/retriever/cstorage/lz4 (because the package was added after that PGO file was captured) also does not trigger the bug.

Also, does the crash output include a register dump?

I ask because, based on #69629 (comment), the register R9 in particular will hold the number of frames written to the buffer. That value might give us a hint at how far unwinding made it, and which functions (if any) might have mishandled the frame pointers. But all of the registers would be useful.

Nope, it does not appear to :( Now that we know the problem seems to be confined to only the currently running goroutine I'm tempted to pare back our dumps to show only the current goroutine to increase the likelihood we get the register dump output.

@lizthegrey
Copy link
Author

lizthegrey commented Sep 27, 2024

Also, does the crash output include a register dump?
I ask because, based on #69629 (comment), the register R9 in particular will hold the number of frames written to the buffer. That value might give us a hint at how far unwinding made it, and which functions (if any) might have mishandled the frame pointers. But all of the registers would be useful.

Nope, it does not appear to :( Now that we know the problem seems to be confined to only the currently running goroutine I'm tempted to pare back our dumps to show only the current goroutine to increase the likelihood we get the register dump output.

No, it cannot include a register dump. This is the failure path inside runtime.fatalthrow because runtime.canpanic returns false since we're inside the runtime when we trigger the sigsegv; thus we have to runtime.throw instead of the usual course that results in runtime.dumpregs being called inside runtime.sighandler.

(thus the message unexpected signal during runtime execution)

@mknyszek mknyszek added this to the Backlog milestone Sep 30, 2024
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 30, 2024
@nsrip-dd
Copy link
Contributor

nsrip-dd commented Oct 2, 2024

No, it cannot include a register dump. This is the failure path inside runtime.fatalthrow because runtime.canpanic returns false since we're inside the runtime when we trigger the sigsegv; thus we have to runtime.throw instead of the usual course that results in runtime.dumpregs being called inside runtime.sighandler.

Indeed. We do reach the dumpregs call when the panic happens on the system stack (which has usually been the case for past frame pointer bugs) but unfortunately we're not on the system stack here...

If we had a core dump we could definitely get all the registers, plus inspect the state of the stack we're failing to unwind. I've had some luck with that on a previous frame pointer issue. You can set GOTRACEBACK=crash to get the program to crash with a core dump. There are probably other things I'm not aware of that you need to do to get the core dump to actually go somewhere persistent for Lambda programs. A core dump would understandably be something you don't want to share publicly, but if you had a core dump we might be able to walk through some stuff you could do with delve to inspect the state of the program.

I've tried reproducing this as well. No luck yet, but I did uncover a different (maybe not related) bug (#69747). It's tricky to construct a CPU profile for PGO which influences inlining for specific functions, at least for a toy example.

@lizthegrey
Copy link
Author

lizthegrey commented Oct 2, 2024

If we had a core dump we could definitely get all the registers, plus inspect the state of the stack we're failing to unwind. I've had some luck with that on a previous frame pointer issue. You can set GOTRACEBACK=crash to get the program to crash with a core dump. There are probably other things I'm not aware of that you need to do to get the core dump to actually go somewhere persistent for Lambda programs. A core dump would understandably be something you don't want to share publicly, but if you had a core dump we might be able to walk through some stuff you could do with delve to inspect the state of the program.

Unfortunately getting a core dump off of a running Lambda is going to be nigh impossible.

However, one option we are open to is modifying our Go runtime during the build process, we have a toolchain for applying patches to /usr/local/go including src/runtime during the build. So we can put dumpregs into the crashing handlers, if we can make sure that the reg dump is for the correct thread/stack. If someone hands me a patch to do that, I can apply that to the test build that reliably crashes when under load.

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Oct 2, 2024

Oh cool, if you can patch the runtime maybe we can do even better with debug logging. Something like this should capture some useful info: debuglog.patch. If you apply that and build the program with -tags debuglog, then the debug log will get dumped if/when the program crashes. It'll look something like this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x100251ba0]

goroutine 1 [running]:
main.main()
	/Users/nick.ripley/sandbox/go/issue-69629/simple_die.go:22 +0xb0
>> begin log 0 <<
[0.000512500 P 2] g= 0x140000021c0 goid= 1 sched.sp= 0x1400010ac90 sched.bp= 0x1400010ac88 curfp= 0x1400010ad98
[0.000516584 P 2] goid= 1 fp= 0x1400010ad98
[0.000516834 P 2] goid= 1 pc= 0x1001f65cc [runtime.blockevent+0x5c /Users/nick.ripley/repos/go/src/runtime/mprof.go:515]
[0.000517084 P 2] goid= 1 fp= 0x1400010adf8
[0.000517209 P 2] goid= 1 pc= 0x1001916b4 [runtime.chanrecv+0x474 /Users/nick.ripley/repos/go/src/runtime/chan.go:654]
[0.000517625 P 2] goid= 1 fp= 0x1400010ae38
[0.000517750 P 2] goid= 1 pc= 0x100191234 [runtime.chanrecv1+0x14 /Users/nick.ripley/repos/go/src/runtime/chan.go:491]
[0.000519625 P 2] goid= 1 fp= 0x1400010aeb8
[0.000519750 P 2] goid= 1 pc= 0x100251b70 [main.main+0x80 /Users/nick.ripley/sandbox/go/issue-69629/simple_die.go:20]
[0.000519834 P 2] goid= 1 fp= 0x1400010aee8
[0.000519959 P 2] goid= 1 pc= 0x1001c47b4 [runtime.main+0x284 /Users/nick.ripley/repos/go/src/internal/runtime/atomic/types.go:194]
[0.000520125 P 2] goid= 1 fp= 0x1400010af38
[0.000520209 P 2] goid= 1 pc= 0x1001fdee4 [runtime.goexit+0x4 /Users/nick.ripley/repos/go/src/runtime/asm_arm64.s:1224]

That should at least definitively tell us how far the unwinding gets when it crashes.

@lizthegrey
Copy link
Author

Oh cool, if you can patch the runtime maybe we can do even better with debug logging. Something like this should capture some useful info: debuglog.patch. If you apply that and build the program with -tags debuglog, then the debug log will get dumped if/when the program crashes. It'll look something like this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x100251ba0]

goroutine 1 [running]:
main.main()
	/Users/nick.ripley/sandbox/go/issue-69629/simple_die.go:22 +0xb0
>> begin log 0 <<
[0.000512500 P 2] g= 0x140000021c0 goid= 1 sched.sp= 0x1400010ac90 sched.bp= 0x1400010ac88 curfp= 0x1400010ad98
[0.000516584 P 2] goid= 1 fp= 0x1400010ad98
[0.000516834 P 2] goid= 1 pc= 0x1001f65cc [runtime.blockevent+0x5c /Users/nick.ripley/repos/go/src/runtime/mprof.go:515]
[0.000517084 P 2] goid= 1 fp= 0x1400010adf8
[0.000517209 P 2] goid= 1 pc= 0x1001916b4 [runtime.chanrecv+0x474 /Users/nick.ripley/repos/go/src/runtime/chan.go:654]
[0.000517625 P 2] goid= 1 fp= 0x1400010ae38
[0.000517750 P 2] goid= 1 pc= 0x100191234 [runtime.chanrecv1+0x14 /Users/nick.ripley/repos/go/src/runtime/chan.go:491]
[0.000519625 P 2] goid= 1 fp= 0x1400010aeb8
[0.000519750 P 2] goid= 1 pc= 0x100251b70 [main.main+0x80 /Users/nick.ripley/sandbox/go/issue-69629/simple_die.go:20]
[0.000519834 P 2] goid= 1 fp= 0x1400010aee8
[0.000519959 P 2] goid= 1 pc= 0x1001c47b4 [runtime.main+0x284 /Users/nick.ripley/repos/go/src/internal/runtime/atomic/types.go:194]
[0.000520125 P 2] goid= 1 fp= 0x1400010af38
[0.000520209 P 2] goid= 1 pc= 0x1001fdee4 [runtime.goexit+0x4 /Users/nick.ripley/repos/go/src/runtime/asm_arm64.s:1224]

That should at least definitively tell us how far the unwinding gets when it crashes.

Heisenbug, with pgo enabled, -tags debuglog, and the runtime patch enabled, it no longer crashes. I am attempting to get the bug to happen again. will keep you posted.

@lizthegrey
Copy link
Author

Got it to repro, with the debugging on. Some additional code we'd landed in the main branch caused pgo to make different decisions, causing it to not repro any more.

I apologise for the fact that multiple of these are all mingled together, but here you go. Filtering on goid should give you what you need.
dump.txt

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Oct 4, 2024

Yay! Thank you for making that modification and collecting all this data! Grabbing one example:

11.819 [signal SIGSEGV: segmentation violation code=0x1 addr=0x30bc5b31 pc=0x43d020]

From other stuff shared in this issue, it seems to consistently crash at dereferencing fp+8, so let's look for 0x30bc5b31 - 8 = 0x30bc5b29. First example I find:

11.91 [1.818167735 P 0] g= 0xc000809880 goid= 176 sched.sp= 0x0 sched.bp= 0x0 curfp= 0xc000112610
11.91 [1.818168976 P 0] goid= 176 fp= 0xc000112610
11.91 [1.818169487 P 0] goid= 176 pc= 0x40abaf [runtime.chanrecv+0x72f runtime/chan.go:653]
11.91 [1.818170264 P 0] goid= 176 fp= 0xc000112710
11.91 [1.818170479 P 0] goid= 176 pc= 0x40a452 [runtime.chanrecv1+0x12 runtime/chan.go:490]
11.91 [1.818170704 P 0] goid= 176 fp= 0xc000112738
11.91 [1.818170897 P 0] goid= 176 pc= 0x1b8a7af [github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Read+0x16f github.com/honeycombio/hound/lib/retriever/cstorage/lz4/lz
4.go:110]
11.91 [1.818171090 P 0] goid= 176 fp= 0xc000112790
11.91 [1.818171414 P 0] goid= 176 pc= 0x1bc47d1 [github.com/honeycombio/hound/lib/retriever/cstorage.(*compactFileReader[go.shape.int64]).readRecord+0x251 github.com/honeycombio/hound/lib
/retriever/cstorage/lz4/lz4.go:156]
11.91 [1.818171625 P 0] goid= 176 fp= 0xc000112808
11.91 [1.818171860 P 0] goid= 176 pc= 0x1ba3910 [github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5+0x1e90 github.com/honeycombio/hound/lib/retriev
er/cstorage/compact_column.go:434]
11.91 [1.818172090 P 0] goid= 176 fp= 0x30bc5b29

All the other cases in the file look like that. It's interesting to me that all the crashes in that file dereference the exact same address...

Here's how I read this example, cross-referencing the full stack trace from this comment:

  • The first return address we see is runtime.chanrecv.
    • This is expected (I think) since from this comment runtime.blockevent is inlined into runtime.chanrecv, so that's where runtime.saveblockevent, which gets the first frame pointer, will return.
  • Next is runtime.chanrecv1, expected
  • Then Read, again expected since that function does a channel receive.
  • Then readRecord.
    • So looks like Next was inlined into that function? See the file name + line number
  • Things go bad when we try to go up to readRowsImpl.func2.2.5's frame.
    • Looks like several functions were inlined into readRowsImpl.func2.2.5?

One more example:

12.212 [2.277829702 P 2] g= 0xc000904e00 goid= 1118 sched.sp= 0x0 sched.bp= 0x0 curfp= 0xc00147e560
12.212 [2.277831426 P 2] goid= 1118 fp= 0xc00147e560
12.212 [2.277832005 P 2] goid= 1118 pc= 0x40abaf [runtime.chanrecv+0x72f runtime/chan.go:653]
12.212 [2.277834862 P 2] goid= 1118 fp= 0xc00147e660
12.212 [2.277835207 P 2] goid= 1118 pc= 0x40a452 [runtime.chanrecv1+0x12 runtime/chan.go:490]
12.212 [2.277835622 P 2] goid= 1118 fp= 0xc00147e688
12.212 [2.277835866 P 2] goid= 1118 pc= 0x1b8ad11 [github.com/honeycombio/hound/lib/retriever/cstorage/lz4.(*Readahead).Next+0x351 github.com/honeycombio/hound/lib/retriever/cstorage/lz4/
lz4.go:110]
12.212 [2.277836062 P 2] goid= 1118 fp= 0xc00147e738
12.212 [2.277836266 P 2] goid= 1118 pc= 0x1babd9d [github.com/honeycombio/hound/lib/retriever/cstorage.(*int64CompactReader).read+0x9d github.com/honeycombio/hound/lib/retriever/cstorage/
compact_column.go:706]
12.212 [2.277836615 P 2] goid= 1118 fp= 0xc00147e760
12.212 [2.277836843 P 2] goid= 1118 pc= 0x1b99c9c [github.com/honeycombio/hound/lib/retriever/cstorage.(*readerWrapper).Initialize.typedToIndex[go.shape.int64].func5+0x5c github.com/honey
combio/hound/lib/retriever/cstorage/column_manager.go:584]
12.212 [2.277837074 P 2] goid= 1118 fp= 0xc00147e7a8
12.212 [2.277837327 P 2] goid= 1118 pc= 0x1b99fb4 [github.com/honeycombio/hound/lib/retriever/cstorage.(*readerWrapper).toIndex+0xf4 github.com/honeycombio/hound/lib/retriever/cstorage/co
lumn_manager.go:734]
12.212 [2.277837755 P 2] goid= 1118 fp= 0xc00147e808
12.212 [2.277838166 P 2] goid= 1118 pc= 0x1ba54a5 [github.com/honeycombio/hound/lib/retriever/cstorage.(*ColumnReader).readRowsImpl.func2.2.5+0x3a25 github.com/honeycombio/hound/lib/retri
ever/cstorage/column_manager.go:746]
12.212 [2.277838408 P 2] goid= 1118 fp= 0x30bc5b29

Different call sequence, but once again things seem to go bad going up to (*ColumnReader).readRowsImpl.func2.2.5's frame.

I'm wondering how readRowsImpl.func.2.2.5 modifies the frame and stack pointers. I understand from your previous comment that you can't share the full source/disassembly of that function. Would you be able to share any lines from the disassembly of that function which modify the BP and/or SP registers? Maybe also (*compactFileReader[go.shape.int64]).readRecord and (*readerWrapper).toIndex?

@lizthegrey
Copy link
Author

lizthegrey commented Oct 4, 2024

I'm wondering how readRowsImpl.func.2.2.5 modifies the frame and stack pointers. I understand from your previous comment that you can't share the full source/disassembly of that function. Would you be able to share any lines from the disassembly of that function which modify the BP and/or SP registers? Maybe also (*compactFileReader[go.shape.int64]).readRecord and (*readerWrapper).toIndex?

Yup, readRecord and toIndex are safe for me to share. I'll get those to you, what's the best email address to use for you? it's likely going to sprawl bigger than we want in a gh issue.

Unfortunately readRowsImpl.func.2.2.5 is literally the core of our storage engine as a whole, that's our main telemetry event analytics loop.

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Oct 4, 2024

You can send the info to nick.ripley@datadoghq.com

@lizthegrey
Copy link
Author

Thank you @nsrip-dd. The problem was not in fact the Go compiler, but was instead a latent bug in https://github.com/dgryski/go-metro/blob/master/metro_amd64.s per dgryski/go-metro#9

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Oct 8, 2024

Thanks to @lizthegrey for gathering so much info to help debug this. Some more detail in case anybody's wondering how we tracked that down: I looked up the 0x30bc5b29 value that shows up all over this issue. It's a constant used for the metrohash algorithm. I hadn't seen any obvious issues in the code/compiler output so far, so I was beginning to suspect bad custom assembly. I looked to see if any Go versions of metrohash use assembly for amd64. Turns out that one does!

As for why was an issue after enabling PGO but not before, given something like this sequence of calls:

readRowsImpl.func2.2.5 -> foo -> github.com/dgryski/go-metro.Hash64

If foo is not inlined, then the bad BP value from Hash64 goes away when foo returns and restores its saved frame pointer. But if foo is inlined, then readRowsImpl.func2.2.5 gets the bad BP value from Hash64, which will be in the frame pointer chain when readRowsImpl.func2.2.5 goes on to call other functions.

@ianlancetaylor
Copy link
Contributor

Thanks for tracking this down and following up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

7 participants