-
-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apps/golang-example hangs after printing Go version #850
Comments
On Thu, Feb 2, 2017 at 11:34 PM, myechuri ***@***.***> wrote:
Attempt at running apps/golang-example from master branch on main repo
<https://github.com/cloudius-systems/osv> failed with signal 11:
# scripts/build mode=debug image=golang-example
# scripts/run.py -d -c1 -e "/hello"
OSv v0.24-295-gf6323f6
eth0: 192.168.122.15
syscall(): unimplemented system call 200
syscall(): unimplemented system call 231
trying to execute null pointer
[backtrace]
0x000000000022d644 <abort(char const*, ...)+270>
0x0000000000489ba5 <page_fault+166>
0x0000000000488a96 <???+4754070>
0x0000000000489a1a <???+4758042>
0x0000100000c852fb <???+13128443>
0x0000100000ca04e6 <???+13239526>
0x00000000006a6feb <???+6975467>
0x000000000044cddb <std::function<void ()>::operator()() const+49>
0x00000000005c1c47 <sched::thread::main()+27>
0x00000000005bdb9b <thread_main_c+38>
0x0000000000489a12 <???+4758034>
GDB console:
(gdb) bt
#0 processor::cli_hlt () at arch/x64/processor.hh:248
#1 0x0000000000209f9c in arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2 0x0000000000499b10 in osv::halt () at arch/x64/power.cc:24
#3 0x000000000022d66b in abort (fmt=0xaa9ed0 "trying to execute null pointer") at runtime.cc:130
#4 0x0000000000489ba6 in page_fault (ef=0xffff800003160068) at arch/x64/mmu.cc:29
#5 <signal handler called>
#6 0x0000000000000000 in ?? ()
#7 0x000000000047e4ce in call_signal_handler (frame=0x2000001ffc80) at arch/x64/signal.cc:77
#8 <signal handler called>
#9 0x0000100000c74d29 in runtime.sysargs (argc=16076528, argv=0x0) at /usr/local/go/src/runtime/os_linux.go:192
#10 0x0000100000c852fc in runtime.args (c=16076528, v=0x0) at /usr/local/go/src/runtime/runtime1.go:64
#11 0x0000100000ca04e7 in runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:143
#12 0x0000004100f54ef0 in ?? ()
#13 0x0000000000000000 in ?? ()
Root cause seems to be this golang issue
<golang/go#13492 (reference)>. Trying
out this fix
<benoit-canet@c93d358>
from go branch on @benoit-canet <https://github.com/benoit-canet> 's fork:
***@***.***:~/benoit-canet-osv/osv# scripts/run.py -c1 -e "/hello"
OSv v0.23-265-gffd88ce
eth0: 192.168.122.15
Failed looking up main. Powering off.
***@***.***:~/benoit-canet-osv/osv#
@benoit-canet <https://github.com/benoit-canet> 's go branch
<https://github.com/benoit-canet/osv/tree/go> uses osv-apps at commit
cc25ca6, which is one commit behind this commit
<cloudius-systems/osv-apps@187dd68>
which added GoMain support. Will submit patch for updating apps module on
this fork.
After pulling golang-example/hello.go from the GoMain commit, the app
launched fine, but hangs after printing Go version
<https://github.com/cloudius-systems/osv-apps/blob/master/golang-example/hello.go#L15>
.
# scripts/run.py -c1 -vd
OSv v0.23-265-gffd88ce
eth0: 192.168.122.15
sigaltstack() stubbed
Hello, 世界
Go version: go1.7.3
GDB output does not show anything out of ordinary:
(gdb) info threads
Id Target Id Frame
* 1 Thread 1 (CPU#0 [running]) sched::cpu_set::operator bool (this=0xffff800001afe7e0) at include/osv/sched.hh:105
(gdb) bt
#0 sched::cpu_set::operator bool (this=0xffff800001afe7e0) at include/osv/sched.hh:105
#1 0x00000000005be7ca in sched::cpu::handle_incoming_wakeups (this=0xffff800001afb040) at core/sched.cc:432
#2 0x00000000005be652 in sched::cpu::do_idle (this=0xffff800001afb040) at core/sched.cc:392
#3 0x00000000005be787 in sched::cpu::idle (this=0xffff800001afb040) at core/sched.cc:423
#4 0x00000000005bdc03 in sched::cpu::<lambda()>::operator()(void) const (__closure=0xffff800001fda070) at core/sched.cc:165
#5 0x00000000005c6a72 in std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/5/functional:1871
#6 0x000000000044e8a8 in std::function<void ()>::operator()() const (this=0xffff800001fda070)
at /usr/include/c++/5/functional:2267
#7 0x00000000005c17cc in sched::thread::main (this=0xffff800001fda040) at core/sched.cc:1171
#8 0x00000000005bd9ae in sched::thread_main_c (t=0xffff800001fda040) at arch/x64/arch-switch.hh:164
#9 0x000000000048b5b3 in thread_main () at arch/x64/entry.S:113
(gdb)
Filing the issue to track the hang because i expected GoMain's return to
terminate the application OSv instance (similar to how native-example
behaves). @benoit-canet <https://github.com/benoit-canet> , can you
please correct if i am wrong?
Perhaps that unloading the go shared library after go main return would
help clean up a lot of stuff the
runtime init put in place and hence help osv exit cleanly.
Also, is this branch <https://github.com/benoit-canet/osv/tree/go> the
current top of tree for Golang work? Is there a specific reason for keeping
this work out of master on main repos
Thanks!
I am working on others topics which predate go on OSv's priority for
ScyllaDB.
But if you feel like improving this go branch and upstreaming it and
filling more missing
syscalls you will get some reviews.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#850>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA5IccxIja5IIDca9NYSylFlGoB-K9Gks5rYln9gaJpZM4L1ucx>
.
|
Thanks for the quick response, @benoit-canet !
Thanks for the pointer. Will try unloading go shared library after go main and update the issue.
Thanks for explaining the reason for existence of the fork. i will be able to work on improving the go branch starting mid February - thanks in advance for reviews! |
I think you will have to do very something creative in order to switch the
stack at syscall entry
without using the stack ... go goroutine stacks are very smalls so a switch
is required before
jumping into the kernel that could overflow them.
Good luck,
Have fun :)
…On Fri, Feb 3, 2017 at 12:10 AM, myechuri ***@***.***> wrote:
Thanks for the quick response, @benoit-canet
<https://github.com/benoit-canet> !
Perhaps that unloading the go shared library after go main return would
help clean up a lot of stuff the runtime init put in place and hence help
osv exit cleanly.
Thanks for the pointer. Will try unloading go shared library after go main
and update the issue.
I am working on others topics which predate go on OSv's priority for
ScyllaDB. if you feel like improving this go branch and upstreaming it and
filling more missing syscalls you will get some reviews.
Thanks for explaining the reason for existence of the fork. i will be able
to work on improving the go branch starting mid February - thanks in
advance for reviews!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#850 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA5IYitXvdJO9lpmPc08YwheFufhcB-ks5rYmJagaJpZM4L1ucx>
.
|
On Mon, Feb 6, 2017 at 9:26 PM, Benoît Canet ***@***.***> wrote:
I think you will have to do very something creative in order to switch the
stack at syscall entry
without using the stack ... go goroutine stacks are very smalls so a switch
is required before
jumping into the kernel that could overflow them.
Benoit, I thought that even without solving the stack problem, we can get
the "hello world" program to work correctly (which is what he was trying to
run), and that you only encountered stack problems in more intensive
workloads?
Anyway, if you want to see a suggestion for a creative solution for the
stack problem, please check out what I wrote in
#808 (especially the second
half of my second comment). I think it will work, though not easy.
|
Thanks, @benoit-canet @nyh ! It does look like the hang is most likely related to redzone issue 808. We have an extra OS thread created for Golang's
After
|
Hi @benoit-canet and @nyh , even though Go routine's stacks are very small originally, according to Golang's official document, Go runtime should manage stacks well generally. I read the solution, #808, provided by nyh and observed related stack allocation of corresponding threads in OSv. Moreover, Go Runtime debugging options were also used in OSv but it still has the same result: OSv just hangs .
It seems that there is conflicting stack management between Go runtime and OSv. Can anyone kindly give me more hints or guidance on how to figure out its compatibilities and root causes? That way I'd be able to understand the issue better before tackling it with confidence. |
Hi @nyh, @benoit-canet, and @myechuri, here are the newest findings, a work-around patch, and an environment to reproduce this issue with the work-around solution based on the June 2017 codebase. Can anyone kindly guide me how to proceed:
It seems that this golang-example app hanged because _terminated was not assigned to be TRUE through application_runtime's destructor. Application_runtime should have been destructed normally. It is clear that other threads still reference it. According to the following log, thread 199 executed GoMain() and exited successfully; however, golang-example app's other threads(200, 201, and 202) were still executing . In short, these threads still referenced the same application_runtime instance so that the golang-example app hanged.
I could make the Golang-example execute and exit successfully through this work-around patch with side effects because of inappropriate _terminated assignment, under a limited condition (only 1 core) Finally, here is my development branch to reproduce this issue based on the June 2017 codebase.
Thank you in advance, I really need some inspiration. |
@HawxChen I'm really not familiar with all the details here ( @benoit-canet and @myechuri probably are), but I would like to point out that OSv is deliberately waiting for all the threads started by Go to terminate, not just its main thread, so if Go leaves behind any running threads - either intentionally or because of a bug, OSv will not shut down and wait (forever) for all these threads to exit. We had a similar issue in Java: Java deliberately leaves behind some threads (such as GC threads and compilation threads) when it exits, so if we want java.so to ever exit when the Java program exit, we need to get rid of these threads. You can see in modules/java-base/java.cc how main() ends with calls to osv::application::unsafe_stop_and_abandon_other_threads() which will forcefully (and not entirely safely...) kill all these threads. Alternatively you can call exit() to force a shutdown regardless of what else is running (although that is ugly, in case other stuff is running on the same VM as well). A separate question is why these extra threads are left behind. Are they threads that were deliberately left behind? Or threads left behind by some bug? My guess is that they we deliberately (or just by negligence) left behind, but I think @myechuri suggested above that there is some stack-related bug that caused these threads to hang. |
Hi @nyh , thank you for your response. I want to share other findings through doubling stacks' sizes and Golang's debugging options. According to your guess and the following #808 quote, in one of my experimental branches , I doubled all stacks used in exceptions, interrupts, threads' function call, and even booting; moreover, their sizes were obviously larger than Go routines' default size.
In the meantime, I also forced Golang to use traditional sbrk() instead of their fancy stack allocation through debugging options:
However, It did not work. I welcome any idea, discussion, and advice. Thank you in advance for your guidance. |
@HawxChen are we still discussing the specific original problem mentioned in the top of this issue, which is that the "hello world" program works but never exits, or something else? The stack issue discussed in #808 is a real problem, and @benoit-canet saw it causing crashes in long runs of http server with high load (for example), and he saw that increasing the stack size worked around that problem. HOWEVER, this doesn't mean that every problem we have with Go is caused by this stack problem! @myechuri suspected that some of the Go threads never exit because of a stack overflow which I didn't fully understand (why don't we see any crash or something coming from this? why a "hang"?), but maybe there's another reason why they never exit? golang/go#11100 suggests that certain Go features like signals create new OS threads and there is no way to ask for them to be shut down. As I said above we had exactly the same problem with Java, and had to add a osv::application::unsafe_stop_and_abandon_other_threads() call to the Java main to ensure all the linger threads are forcefully killed when the main user thread is done. |
Hi @nyh, Thank you for your detailed reply. Yes, I am still discussing the same program. For now, I have no advanced thoughts on this issue so I will keep this issue in my mind. Hopefully, in the near future, I will be more familiar with OSv in order to solve this issue successfully. I decided to implement #808 because it is not only interesting but also helpful for practicing OSv. I appreciate your discussion. |
Attempt at running apps/golang-example from master branch on main repo failed with signal 11:
GDB console:
Root cause seems to be this golang issue. Trying out this fix from
go
branch on @benoit-canet 's fork:@benoit-canet 's go branch uses
osv-apps
at commitcc25ca6
, which is one commit behind this commit which addedGoMain
support. Will submit patch for updating apps module on this fork.After pulling
golang-example/hello.go
from theGoMain
commit, the app launched fine, but hangs after printing Go version.GDB output does not show anything out of ordinary:
Filing the issue to track the hang because i expected
GoMain
's return to terminate the application OSv instance (similar to hownative-example
behaves). @benoit-canet , can you please correct if i am wrong?Also, is this branch the current top of tree for Golang work? Is there a specific reason for keeping this work out of master on main repo? Thanks!
The text was updated successfully, but these errors were encountered: