-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement proper partition size detection. #15
Conversation
We need to go outside the comfort zone and do some low level OS probing with syscall and unsafe pointers to achieve this.
@maciejmrowiec: Now it fails because |
from syscall package documentation: NOTE: This package is locked down. Code outside the standard Go repository should be migrated to use the corresponding package in the golang.org/x/sys repository. |
I know, but it's the exact same interface. The syscall module is not deprecated, locked down just means that they are not accepting any more specialized syscalls. But we are not using those anyway, we are using the generic one. I could migrate it, but I don't really see the point. It won't help for this error message, and it just means we get an extra "go get" dependency. |
@kacf |
Thanks! |
@kacf rebase to get the new config |
Something buggy with this PR. Opening a new one. |
Once in a while, in release mode only, this test will display this symptom: ``` ... record_id=163 severity=trace time="2023-Oct-03 16:22:53.911616" name="http_client" url="http://127.0.0.1:8001" msg="Read 16384 bytes of body data from stream." record_id=164 severity=trace time="2023-Oct-03 16:22:53.911802" name="http_client" url="http://127.0.0.1:8001" msg="Read 16384 bytes of body data from stream." record_id=165 severity=warning time="2023-Oct-03 16:22:53.912043" name="http_client" url="http://127.0.0.1:8001" msg="Client destroyed while request is still active!" [ OK ] HttpTest.TestResponseBody (202 ms) [----------] 1 test from HttpTest (202 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (202 ms total) [ PASSED ] 1 test. corrupted double-linked list Aborted (core dumped) ``` The backtrace reveals that it happens at the very very end, when exit handlers are called: ``` Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139805181667136) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139805181667136) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=139805181667136) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=139805181667136, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007f26ee375476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f26ee35b7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007f26ee3bc6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f26ee50eb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x00007f26ee3d3d7c in malloc_printerr ( str=str@entry=0x7f26ee50c72e "corrupted double-linked list") at ./malloc/malloc.c:5664 #7 0x00007f26ee3d484c in unlink_chunk (p=<optimized out>, av=0x7f26ee54cc80 <main_arena>) at ./malloc/malloc.c:1635 #8 0x00007f26ee3d49e9 in malloc_consolidate ( av=av@entry=0x7f26ee54cc80 <main_arena>) at ./malloc/malloc.c:4780 #9 0x00007f26ee3d5f20 in _int_free (av=0x7f26ee54cc80 <main_arena>, p=0x561b9a7adae0, have_lock=<optimized out>) at ./malloc/malloc.c:4674 #10 0x00007f26ee3d84d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 #11 0x00007f26eeb2017d in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #12 0x00007f26eeb44d0d in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #13 0x00007f26eeb1b1d5 in CRYPTO_free_ex_data () from /lib/x86_64-linux-gnu/libcrypto.so.3 #14 0x00007f26eeb13d1f in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #15 0x00007f26eeb1d929 in OPENSSL_cleanup () from /lib/x86_64-linux-gnu/libcrypto.so.3 #16 0x00007f26ee378495 in __run_exit_handlers (status=0, listp=0x7f26ee54c838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:113 #17 0x00007f26ee378610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143 #18 0x00007f26ee35cd97 in __libc_start_call_main ( main=main@entry=0x561b9a0c0f70 <main(int, char**)>, argc=argc@entry=2, argv=argv@entry=0x7ffe48d637c8) at ../sysdeps/nptl/libc_start_call_main.h:74 #19 0x00007f26ee35ce40 in __libc_start_main_impl ( main=0x561b9a0c0f70 <main(int, char**)>, argc=2, argv=0x7ffe48d637c8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe48d637b8) at ../csu/libc-start.c:392 #20 0x0000561b9a0c1a35 in _start () ``` It is unknown what causes the corruption, and the problem only happens in release mode with sanitizers disabled, so it's very hard to investigate. But although the root cause isn't known, it's believed to happen when the body has not been completely consumed, and the program exits. Since this "don't-consume -> then exit" scenario is very unlikely in production, work around it by making sure both handlers have run before exiting, instead of only one of them. I tested this for hundreds of runs, and it worked. Previously it would fail every 15-30 runs or so. This also has the added benefit of not accidentally skipping the test conditionals inside the body handler. Signed-off-by: Kristian Amlie <kristian.amlie@northern.tech>
We need to go outside the comfort zone and do some low level OS
probing with syscall and unsafe pointers to achieve this.