Skip to content

gc: fix assertion / ASAN violation in gc_big_object_link #56944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 4, 2025

We somehow just got (un)lucky that DFS! at Compiler/src/ssair/domtree.jl:184 just happened to store exactly the same value as this pointer in this particular memory location previously, so that this branch on undef hit exactly the right value to fail. What are the odds?

Seen on a CI run (with rr)

The odds of this happening seem somewhere around 2^60 against, to 1 for each time. So that seems impressive we hit this even this once.

But we did, and the proof is here, caught in rr:
https://buildkite.com/julialang/julia-master/builds/43366#019425d7-67fd-4f33-a025-6d7cd6181649

      From worker 6:	julia: /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492: gc_big_object_link: Assertion `node->header != gc_bigval_sentinel_tag' failed.
2025-01-02 07:47:22 UTC	      From worker 6:
2025-01-02 07:47:22 UTC	      From worker 6:	[3877] signal 6 (-6): Aborted
2025-01-02 07:47:22 UTC	      From worker 6:	in expression starting at none:1
2025-01-02 07:47:22 UTC	      From worker 6:	gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	unknown function (ip: 0x7fb9a4b5040e) at /lib/x86_64-linux-gnu/libc.so.6
2025-01-02 07:47:22 UTC	      From worker 6:	__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	gc_big_object_link at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492 [inlined]
2025-01-02 07:47:22 UTC	      From worker 6:	gc_setmark_big at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.c:276
2025-01-02 07:47:22 UTC	      From worker 6:	jl_gc_big_alloc_inner at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:491

@vtjnash vtjnash added heisenbug This bug occurs unpredictably GC Garbage collector backport 1.11 Change should be backported to release-1.11 labels Jan 4, 2025
We somehow just got (un)lucky that `DFS!` at Compiler/src/ssair/domtree.jl:184
just happened to store exactly the same value as this pointer in this
particular memory location previously. What are the odds?

Seen on a CI run (with rr)
@vtjnash vtjnash force-pushed the jn/gc-assert-gc_big_object_link branch from f6a05f9 to 08b1a92 Compare January 5, 2025 02:42
@vtjnash vtjnash merged commit 36472a7 into master Jan 6, 2025
4 of 7 checks passed
@vtjnash vtjnash deleted the jn/gc-assert-gc_big_object_link branch January 6, 2025 18:58
@KristofferC KristofferC mentioned this pull request Jan 28, 2025
38 tasks
KristofferC pushed a commit that referenced this pull request Feb 14, 2025
We somehow just got (un)lucky that `DFS!` at
Compiler/src/ssair/domtree.jl:184 just happened to store exactly the
same value as this pointer in this particular memory location
previously, so that this branch on `undef` hit exactly the right value
to fail. What are the odds?

Seen on a CI run (with rr)

The odds of this happening seem somewhere around 2^60 against, to 1 for
each time. So that seems impressive we hit this even this once.

But we did, and the proof is here, caught in rr:

https://buildkite.com/julialang/julia-master/builds/43366#019425d7-67fd-4f33-a025-6d7cd6181649
```
      From worker 6:	julia: /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492: gc_big_object_link: Assertion `node->header != gc_bigval_sentinel_tag' failed.
2025-01-02 07:47:22 UTC	      From worker 6:
2025-01-02 07:47:22 UTC	      From worker 6:	[3877] signal 6 (-6): Aborted
2025-01-02 07:47:22 UTC	      From worker 6:	in expression starting at none:1
2025-01-02 07:47:22 UTC	      From worker 6:	gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	unknown function (ip: 0x7fb9a4b5040e) at /lib/x86_64-linux-gnu/libc.so.6
2025-01-02 07:47:22 UTC	      From worker 6:	__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	gc_big_object_link at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492 [inlined]
2025-01-02 07:47:22 UTC	      From worker 6:	gc_setmark_big at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.c:276
2025-01-02 07:47:22 UTC	      From worker 6:	jl_gc_big_alloc_inner at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:491
```

(cherry picked from commit 36472a7)
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector heisenbug This bug occurs unpredictably
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants