Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programs that succeed to execute with v1.6.0 sometimes hang indefinitely with v1.8.1 during instantiation. #2340

Open
davidmdm opened this issue Nov 14, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@davidmdm
Copy link

Describe the bug
Programs that succeed to execute with v1.6.0 sometimes hang indefinitely with v1.8.1 during instantiation.
The same program crashes with v1.7.x

To Reproduce
Clone the example repository:

git clone https://github.com/davidmdm/wzbug.git
cd ./wzbug
./test.sh

Expected behavior
Program should behave the same as with the v1.6.0 runtime.

Environment (please complete the relevant information):

  • Go version: 1.23.3
  • wazero Version: v1.8.1
  • Host architecture: Arm64 (M2 Mac)
  • Runtime mode: Compiler

Additional context

Incredibly rare and difficult to reproduce. If I tweak the example program slightly, the bug disappears. If I don't use stdin the bug disappears. However I don't want to write programs fearing that the perfect storm will hang my execution.

@davidmdm davidmdm added the bug Something isn't working label Nov 14, 2024
@evacchi
Copy link
Contributor

evacchi commented Nov 14, 2024

If I don't use stdin the bug disappears.

that's really useful, I wonder if that bug lies there entirely 🤔 stdin has a special treatment, especially in poll_oneoff, and when used interactively

@mathetake
Copy link
Member

relevant to
#2178 ?

@davidmdm
Copy link
Author

davidmdm commented Nov 14, 2024

@evacchi here is an incomplete list of weird things which have made the program succeed.

  • using json.Unmarshal instead of K8s/yaml Decoder
  • hard coding the data instead of parsing it from stdin
  • parsing the data as is but removing any further code related to json encoding of the various k8 APIs

However if I parse stdin using the yaml decoder from k8s and use that data it hangs.

@evacchi
Copy link
Contributor

evacchi commented Nov 15, 2024

uhm, if you pipe into stdin then the fact it's stdin might be unrelated. In fact, you can try to run in interpreter mode and verify if that still occurs. If it does, then the problem might be WASI-related; otherwise it is likely an issue in the compiler. It would not be the first time Go's huge binaries make us spot an issue if you recall #2158 😬

@davidmdm
Copy link
Author

@evacchi,

I just ran it using the interpreter and it ran successfully in all versions that I am testing: 1.6.0, 1.7.3, and 1.8.1.

Compiler it is?

@davidmdm
Copy link
Author

I also pushed a new branch to the repository called segfault.

I modified the example program in what should be harmless ways... Just removing certain fields from the structs that I was going to json.Marshal, and this had the effect that both 1.6.0 and 1.7.3 worked as expected but 1.8.1 went from hanging to a segfault.

I am unsure what about my code is cursed, but hopefully it helps find the issue?

@evacchi
Copy link
Contributor

evacchi commented Nov 15, 2024

most likely another issue with big binaries :D and possibly again with jumps

@ncruces
Copy link
Collaborator

ncruces commented Nov 15, 2024

If it's a compiler issue, then testing amd64 vs arm64 is instructive. On an M mac you can test both. On Linux you can do the same using Qemu and binfmt.

Under the right configuration, just doing one of these should "just work" (Go is just great about it):

GOARCH=amd64 go test ./...
GOARCH=arm64 go test ./...

GOARCH=amd64 go run main.go
GOARCH=arm64 go run main.go

@evacchi
Copy link
Contributor

evacchi commented Nov 18, 2024

confirmed that's arm64-only 🙈

@evacchi
Copy link
Contributor

evacchi commented Nov 18, 2024

ok first of all, I had to disable caching because it might have been masking the underlying issue. In my first bisect the offending commit seemed to be d4a4903, and if I backed to 0b543f7 the issue disappeared;

but really, if I disable the cache, it won't hang anymore, instead it panics:
EDIT to be more precise, after a given commit it won't hang anymore

❯ go run .
Using: github.com/tetratelabs/wazero v1.8.1
[debug] module compiled
[debug] instantiating module...
failed to execute wasm: failed to instantiate module: module closed with exit_code(2): stderr: panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x0 addr=0x0 pc=0x0]

goroutine 1 [running]:
bufio.(*Reader).ReadSlice(0x0, 0xa)
	/Users/evacchi/.local/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.3.darwin-arm64/src/bufio/bufio.go:351 +0x7
bufio.(*Reader).ReadLine(0x0)
	/Users/evacchi/.local/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.3.darwin-arm64/src/bufio/bufio.go:405 +0x2
k8s.io/apimachinery/pkg/util/yaml.(*LineReader).Read(0x1858000)
	/Users/evacchi/.local/go/pkg/mod/k8s.io/apimachinery@v0.31.2/pkg/util/yaml/decoder.go:362 +0x5

exit status 1

Notice that because I have -edit replace'd my local install, it says v1.8.1 but it's pointing to my local branch; thus the cache is also reusing the same sub-directory all the time, which is obviously incorrect because caches are not necessarily compatible across versions.

--> This time the bisect points to ab0d27c and indeed if I back off to 48f702e the bug goes away.

EDIT2: the panic above turns into a hard crash beginning with c6ffc9e (not a hang)

runtime: newstack sp=0x14000219280 stack=[0x14013f04000, 0x14013f08000]
	morebuf={pc:0x104ce1774 sp:0x14000219280 lr:0x0}
	sched={pc:0x10037bf18 sp:0x14000219280 lr:0x104ce1774 ctxt:0x0}
runtime: gp=0x140000021c0, goid=1, gp->status=0x2
 runtime: split stack overflow: 0x14000219280 < 0x14013f04000
fatal error: runtime: split stack overflow

so essentially it seems to originate from the regalloc refactoring at ab0d27c, and then it "propagated" eventually turning into this hang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants